Module llamapun::util::data_helpers
source · [−]Expand description
Helpers with transactional logic related to llamapun::data which doesn’t fit with the main structs TODO: May be reorganized better with some more thought, same as path_helpers
Structs
Options for lexical normalization on an individual word
Functions
Normalization of word lexemes created for the “AMS paragraph classification” experiment
operating on a DNMRange representation
Provides a string for a given heading node, using DNM-enabled word-tokenization
TODO: This is a low-level auxiliary function, we may need to build more user-facing interfaces
if it becomes more widely useful
Check if the given DNM contains valid English+Latin content
Attempt to recover the “type” of a potentially specialized heading,
e.g. “definition xiii a”->“definition”