I'm trying to use pyparsing==2.4.7
to parse search queries that have a field:value
format.
Examples of the strings I want to parse include:
field1:value1
field1:value1 field2:value2
field1:value1 AND field2:value2
(field1:value1a OR field1:value1b) field2:value2
(field1:value1a | field1:value1b) & (field2:value2a | field2:value2b)
A few things to note:
- I'm using
OR
and|
to both mean "OR", same withAND
and&
meaning the same thing - If there is no boolean operator between conditions, then an
AND
is implied - Queries can be nested hierarchically with parentheses
- The values (on the right side of the
:
) will never have spaces
I have written a parser that works (code is based on this SO answer), but only for when all of the operators are present (AND
and OR
):
import pyparsing as pp
from pyparsing import Word, alphas, alphanums, White, Combine, OneOrMore, Literal, oneOf
field_name = Word(alphanums).setResultsName('field_name')
search_value = Word(alphanums + '-').setResultsName('search_value')
operator = Literal(':')
query = field_name + operator + search_value
AND = oneOf(['AND', 'and', '&', ' '])
OR = oneOf(['OR', 'or', '|'])
NOT = oneOf(['NOT', 'not', '!'])
query_expr = pp.infixNotation(query, [
(NOT, 1, pp.opAssoc.RIGHT, ),
(AND, 2, pp.opAssoc.LEFT, ),
(OR, 2, pp.opAssoc.LEFT, ),
])
class ComparisonExpr:
def __init__(self, tokens):
self.tokens = tokens
def __str__(self):
return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens)
def __repr__(self):
return self.__str__()
query.addParseAction(ComparisonExpr)
sample = "(field1:value1a | field1:value1b) & (field2:value2a | field2:value2b)"
result = query_expr.parseString(sample).asList()
from pprint import pprint
>>> pprint(result)
[[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'),
'|',
Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')],
'&',
[Comparison:('field': 'field2', 'operator': ':', 'value': 'value2a'),
'|',
Comparison:('field': 'field2', 'operator': ':', 'value': 'value2b')]]]
However, if I try it with a sample
that is missing a operator, the parser appears to stop at the point where an operator would be expected:
sample = "(field1:value1a | field1:value1b) (field2:value2a | field2:value2b)"
result = query_expr.parseString(sample).asList()
from pprint import pprint
pprint(result)
[[Comparison:('field': 'field1', 'operator': ':', 'value': 'value1a'),
'|',
Comparison:('field': 'field1', 'operator': ':', 'value': 'value1b')]]
Is there a way to make whitespace an "implicit AND
" if there is no operator separating terms?