I learned that in pyparsing, you can name an element/group/node by doing this:
token = pyparsing.Literal("Foobar")("element_name_here")
So, I made a sample program to test it out:
import pyparsing as pp
Prefix = pp.Word(pp.nums)("Prefix")
Name = pp.Literal("FOOBAR")("Name")
Modifier = pp.Word(pp.alphas)("Modifier")
Modifier_Group = pp.Group(pp.OneOrMore(Modifier))("Modifier_Group")
Sentence = pp.Group(pp.Optional(Prefix) + Name + Modifier_Group)("Sentence")
out = Sentence.parseString("123 FOOBAR testA testB")
Then, I tried getting the output with these named tokens.
I tried this:
>>> print out
[['123', 'FOOBAR', ['testA', 'testB']]]
...but that doesn't get me the token names.
I then tried doing the following:
>>> print out.items()
[('Sentence', (['123', 'FOOBAR', (['testA', 'testB'], {'Modifier': [('testA', 0),
('testB', 1)]})], {'Modifier_Group': [((['testA', 'testB'], {'Modifier': [('testA', 0),
('testB', 1)]}), 2)], 'Prefix': [('123', 0)], 'Name': [('FOOBAR', 1)]}))]
>>> print dict(out)
{'Sentence': (['123', 'FOOBAR', (['testA', 'testB'], {'Modifier': [('testA', 0),
('testB', 1)]})], {'Modifier_Group': [((['testA', 'testB'], {'Modifier': [('testA', 0),
('testB', 1)]}), 2)], 'Prefix': [('123', 0)], 'Name': [('FOOBAR', 1)]})}
>>> import collections
>>> print collections.OrderedDict(out)
OrderedDict([('Sentence', (['123', 'FOOBAR', (['testA', 'testB'], {'Modifier': [
('testA', 0), ('testB', 1)]})], {'Modifier_Group': [((['testA', 'testB'],
{'Modifier': [('testA', 0), ('testB', 1)]}), 2)], 'Prefix': [('123', 0)],
'Name': [('FOOBAR', 1)]}))])
...but they contained a peculiar mixture of dicts, lists, and tuples, and I couldn't figure out how to parse them. Then, I tried doing this:
>>> print out.asXML()
<Sentence>
<Sentence>
<Prefix>123</Prefix>
<Name>FOOBAR</Name>
<Modifier_Group>
<Modifier>testA</Modifier>
<Modifier>testB</Modifier>
</Modifier_Group>
</Sentence>
</Sentence>
...and that got me EXACTLY what I wanted, except that it's in XML, instead of a python data structure that I can easily manipulate. Is there some way to get such a data structure (without having to parse the XML)?
I did find a solution that returns a nested dict, but dicts in python are unordered, (and I want the tokens in order), so it isn't a solution for me.