I am attempting to create a text parser that will allow limited user defined substitution rules.
To wit I am reading in codes from a DOS ASCII file in which the ordering is significant and the line numbering must be maintained. With this input I want to apply user defined substitution rules (exchange this string for that string, if we see this string followed by that string perform this translation, etc).
The output is also a formatted DOS ASCII file.
Most of the rules are straight forward replace tit for tat type replacements, however, there are situations where I want to define a rule like if A is followed by B at any point in the future, apply this rule.
To do this I'm using a tree of structs as such:
struct node {
list<string> common; // the text which is not affected by conditions
string condition; // matching this string selects the left, otherwise the right
node *lptr, *rptr; // pointers to the child nodes, if needed
};
Any time I encounter such a rule I can maintain the output with the rule both omitted and applied, delaying the decision of which to use until it's resolved unambiguously.
It's somewhat memory wasteful, but seems the best way to avoid having to pass over the input data twice (the size of the input data is unknown but probably less than 1 meg).
Of course such a case might exist that a different rule of this type triggers within one or both of the child nodes, so that's why a tree structure.
There is no restriction that the children must be decided before their parents, it may be that the parent is only decidable on one branch of the child. Encountering EOF would decide any undecided children in the false direction.
So it's clear I must be careful when rewinding and collapsing nodes.
Is there a simpler solution to this general problem? Is there a way to use standard library containers in a more efficient way than my tree presents?