Expand description
The LLaMaPUn
library in Rust
Language and Mathematics Processing and Understanding Common data structures and algorithms for semi-structured NLP on math-rich documents.
Modules
Representation, normalization and utilities for working with AMS markup in LaTeX-derived
scientific documents
Data structures and Iterators for convenient high-level syntax
The
dnm
can be used for easier switching between the DOM
(Document Object Model) representation and the plain text representation,
which is needed for most NLP tools.Expose convenience calls to be used from non-Rust applications
A small ngram library
ngrams are sequences of n consecutive words
Data structures and Iterators for rayon-enabled parallel processing
including parallel I/O in walking a corpus
as well as DOM primitives that allow parallel iterators on XPath results, etc
A module for pattern matching in mathematical documents
A tiny stopwords library
Stopwords are words frequent words like “the”, “it”, “then”, which would add too much noise
to certain statistical methods
Provides functionality for tokenizing sentences and words
Various useful code snippets
Macros
A handy macro for idiomatic recording in the node_map