pub struct DNM {
pub plaintext: String,
pub byte_offsets: Vec<usize>,
pub parameters: DNMParameters,
pub root_node: RoNode,
pub node_map: HashMap<usize, (usize, usize)>,
pub runtime: RuntimeParseData,
pub back_map: Vec<(RoNode, i32)>,
}
Expand description
The DNM
is essentially a wrapper around the plain text representation
of the document, which facilitates mapping plaintext pieces to the DOM.
This breaks, if the DOM is changed after the DNM generation!
Fields
plaintext: String
The plaintext
byte_offsets: Vec<usize>
As the plaintext is UTF-8: the byte offsets of the characters
parameters: DNMParameters
The options for generation
root_node: RoNode
The root node of the underlying xml tree
node_map: HashMap<usize, (usize, usize)>
Maps nodes to plaintext offsets
runtime: RuntimeParseData
A runtime object used for holding auxiliary state
back_map: Vec<(RoNode, i32)>
maps an offset to the corresponding node, and the offset in the node offset -1 means that the offset corresponds to the entire node this is e.g. used if a node is replaced by a token.
Implementations
sourceimpl DNM
impl DNM
sourcepub fn to_c14n_basic(&self) -> String
pub fn to_c14n_basic(&self) -> String
Our linguistic canonical form will only include 1) node name, 2) class attribute and 3) textual content - excludes certain experimental markup, such as all math annotation elements - excludes whitespace nodes and comment nodes
sourcepub fn node_c14n_basic(&self, node: RoNode) -> String
pub fn node_c14n_basic(&self, node: RoNode) -> String
Canonicalize a single node of choice
sourcepub fn to_hash_basic(&self) -> String
pub fn to_hash_basic(&self) -> String
Obtain an MD5 hash from the canonical string of the entire DOM
sourcepub fn node_hash_basic(&self, node: RoNode) -> String
pub fn node_hash_basic(&self, node: RoNode) -> String
Obtain an MD5 hash from the canonical string of a Node
sourceimpl DNM
impl DNM
sourcepub fn new(root_node: RoNode, parameters: DNMParameters) -> DNM
pub fn new(root_node: RoNode, parameters: DNMParameters) -> DNM
Creates a DNM
for root
sourcepub fn from_str(
text: &str,
params_opt: Option<DNMParameters>
) -> Result<(Document, Self), Box<dyn Error>>
pub fn from_str(
text: &str,
params_opt: Option<DNMParameters>
) -> Result<(Document, Self), Box<dyn Error>>
Use the DNM abstraction over a plaintext utterance, assuming it stands for a single paragraph
sourcepub fn from_ams_paragraph_str(
text: &str,
params: Option<DNMParameters>
) -> Result<(Document, Self), Box<dyn Error>>
pub fn from_ams_paragraph_str(
text: &str,
params: Option<DNMParameters>
) -> Result<(Document, Self), Box<dyn Error>>
Rebuild a llamapun-generated tokenized plaintext into a DNM quite specific to the AMS paragraph generation
sourcepub fn get_range_of_node(
&self,
node: RoNode
) -> Result<DNMRange<'_>, Box<dyn Error>>
pub fn get_range_of_node(
&self,
node: RoNode
) -> Result<DNMRange<'_>, Box<dyn Error>>
Get the plaintext range of a node
sourcepub fn get_range(&self) -> Result<DNMRange<'_>, Box<dyn Error>>
pub fn get_range(&self) -> Result<DNMRange<'_>, Box<dyn Error>>
Get the range representing the full DNM
sourcepub fn get_plaintext(&self) -> &str
pub fn get_plaintext(&self) -> &str
Get the underlying text for this DNM