Expand description

The LLaMaPUn library in Rust

Language and Mathematics Processing and Understanding Common data structures and algorithms for semi-structured NLP on math-rich documents.

Modules

Representation, normalization and utilities for working with AMS markup in LaTeX-derived scientific documents
Data structures and Iterators for convenient high-level syntax
The dnm can be used for easier switching between the DOM (Document Object Model) representation and the plain text representation, which is needed for most NLP tools.
Expose convenience calls to be used from non-Rust applications
A small ngram library ngrams are sequences of n consecutive words
Data structures and Iterators for rayon-enabled parallel processing including parallel I/O in walking a corpus as well as DOM primitives that allow parallel iterators on XPath results, etc
A module for pattern matching in mathematical documents
A tiny stopwords library Stopwords are words frequent words like “the”, “it”, “then”, which would add too much noise to certain statistical methods
Provides functionality for tokenizing sentences and words
Various useful code snippets

Macros

A handy macro for idiomatic recording in the node_map