The reason your list
is not being parsed lies in this expression:
element = word | obj | list
Because you are checking for word
before list
(which is a really awful
variable name when working in Python, btw), then the leading "foo" in
"foo,bar" is being processed as a word
, since '|' is an eager operator,
matching on the first matching expression.
You can fix this by changing the order of expressions in element
:
element = list | word | obj
Or by using '^' instead of '|'. '^' is a patient operator - it evaluates
all of the alternative expressions and selects the longest match.
element = word ^ obj ^ list
With either of these changes, your output now becomes:
word
list
word
list
obj
word
word
list
Why all the list matching? Because delimitedList
will match a single item:
>>> wd = Word(alphas)
>>> wdlist = delimitedList(wd)
>>> print(wdlist.parseString('xyz'))
['xyz']
If you want to enforce that lists must have > 1 item, then you can add a
condition parse action:
>>> wdlist.addCondition(lambda t: len(t)>1)
>>> print(wdlist.parseString('xyz'))
... raises exception ...
Also, delimitedLists do not automatically group their results:
>>> print((wd + wdlist).parseString('xyz abc,def'))
['xyz', 'abc', 'def']
If you want to keep the list contents as a list in the results, then wrap
the list expression in a Group:
>>> print((wd + Group(wdlist)).parseString('xyz abc,def'))
['xyz', ['abc', 'def']]
Here is my updated version of your process()
method:
def process(string):
print(string)
word = ~Literal('OBJ') + Word(alphas.lower())
word.addParseAction(lambda s,l,t: found_word(s, l, t))
word.setName("word")
obj = Literal('OBJ') + Word(alphas.lower())
obj.setName("obj")
obj.addParseAction(lambda s,l,t: found_obj(s, l, t))
item = word | obj
list = Group(pyparsing.delimitedList(item, delim=',')
.addCondition(lambda t: len(t)>1))
list.setName("list")
list.addParseAction(lambda s,l,t: found_list(s, l, t))
element = obj | list | word
parser = pyparsing.OneOrMore(element)
parser.searchString(string).pprint()
Which gives this output:
foo bar OBJ baz foo,bar
word
word
word
word
obj
word
word
list
[['foo', 'bar', 'OBJ', 'baz', ['foo', 'bar']]]
You'll note that I added setName()
calls for each of your expressions. That
is so that I could add setDebug()
to get pyparsing's debug output. By adding:
word.setDebug()
obj.setDebug()
list.setDebug()
before calling parseString
, you get this debugging output. It may help explain
why you are getting the replicated "word"s in your sample output.
foo bar OBJ baz foo,bar
Match obj at loc 0(1,1)
Exception raised:Expected "OBJ", found 'f' (at char 0), (line:1, col:1)
Match list at loc 0(1,1)
Match word at loc 0(1,1)
word
Matched word -> ['foo']
Exception raised:failed user-defined condition, found 'f' (at char 0), (line:1, col:1)
Match word at loc 0(1,1)
word
Matched word -> ['foo']
Match obj at loc 3(1,4)
Exception raised:Expected "OBJ", found 'b' (at char 4), (line:1, col:5)
Match list at loc 3(1,4)
Match word at loc 4(1,5)
word
Matched word -> ['bar']
Exception raised:failed user-defined condition, found 'b' (at char 4), (line:1, col:5)
Match word at loc 3(1,4)
word
Matched word -> ['bar']
Match obj at loc 7(1,8)
obj
Matched obj -> ['OBJ', 'baz']
Match obj at loc 15(1,16)
Exception raised:Expected "OBJ", found 'f' (at char 16), (line:1, col:17)
Match list at loc 15(1,16)
Match word at loc 16(1,17)
word
Matched word -> ['foo']
Match word at loc 20(1,21)
word
Matched word -> ['bar']
list
Matched list -> [['foo', 'bar']]
Match obj at loc 23(1,24)
Exception raised:Expected "OBJ", found end of text (at char 23), (line:1, col:24)
Match list at loc 23(1,24)
Match word at loc 23(1,24)
Exception raised:Expected W:(abcd...), found end of text (at char 23), (line:1, col:24)
Match obj at loc 23(1,24)
Exception raised:Expected "OBJ", found end of text (at char 23), (line:1, col:24)
Exception raised:Expected {word | obj}, found end of text (at char 23), (line:1, col:24)
Match word at loc 23(1,24)
Exception raised:Expected W:(abcd...), found end of text (at char 23), (line:1, col:24)
Match obj at loc 23(1,24)
Exception raised:Expected "OBJ", found end of text (at char 23), (line:1, col:24)
Match list at loc 23(1,24)
Match word at loc 23(1,24)
Exception raised:Expected W:(abcd...), found end of text (at char 23), (line:1, col:24)
Match obj at loc 23(1,24)
Exception raised:Expected "OBJ", found end of text (at char 23), (line:1, col:24)
Exception raised:Expected {word | obj}, found end of text (at char 23), (line:1, col:24)
Match word at loc 23(1,24)
Exception raised:Expected W:(abcd...), found end of text (at char 23), (line:1, col:24)
[['foo', 'bar', 'OBJ', 'baz', ['foo', 'bar']]]