2

I am looking for a regular expression in python to match a logical expression.

I want to match string NOT search string.

I just want to match the two literals between a logical operator(AND|OR) which are two different words separated by space.

Example:

The following conditions should match:

  • (abc AND xyz)
  • (abc AND 123)
  • (abc AND 123.456)
  • (123 AND 123.456)
  • (.001 AND 1)

Same with OR operator

  • (abc OR xyz)
  • (abc OR 123)
  • (abc OR 123.456)
  • (123 OR 123.456)
  • (.001 OR 1)

The following conditions should NOT match:

  • (AND AND AND)
  • (AND AND abc)
  • (123 AND AND)
  • (OR AND OR)
  • (AND OR OR)

I tried the following without any success, ('AND AND abc') still matches... ('abc AND AND') doesn't match though.

  • ^((?!AND$|OR$)\w+|\d*\.\d+|\d+)\s+(AND|OR)\s+((?!AND$|OR$)\w+|\d*\.\d+|\d+)$

code:

p=re.compile(r'(^((?!AND$|OR$)\w+|\d*\.\d+|\d+)\s+(AND|OR)\s+((?!AND$|OR$)\w+|\d*\.\d+|\d+)$)')
p.match('AND AND abc')

Thanks in advance for all your help!

Ooty
  • 23
  • 5
  • 1
    Most logical languages (including the one you described) are context-free languages, which are more powerful than regular languages. This task is impossible with regex, and if it is possible through some property of real-world regex, you are still shooting yourself in the foot. – L3viathan Jan 29 '16 at 00:39
  • @Ooty, are the strings really surrounded by parentheses? – Alan Moore Jan 29 '16 at 01:21
  • Hi All, Thank you very much for swift response. @AlanMoore, the string may or may not be surrounded by parentheses. – Ooty Jan 29 '16 at 02:22
  • I can anyway strip the parentheses, So its not needed. – Ooty Jan 29 '16 at 02:31

3 Answers3

1

You've got a whole lot going on there.
The best thing to do is to move the sequential operator check to the beginning
using a lookahead assertion. The rest just matches a form.

Note that you could also add a whitespace boundary check within the operator
check if you think ANDxxx could be an operand.

Update - By OP request, added optional +- before operand's and optional
whitespace before and after math expression.

^(?!.*(?<!\S)(?:AND|OR)\s+(?:AND|OR)(?!\S))\s*([+-]?(?:\w+|(?:\d+(?:\.\d*)?|\.\d+)))\s+(AND|OR)\s+([+-]?(?:\w+|(?:\d+(?:\.\d*)?|\.\d+)))\s*$

Expanded

 ^ 
 (?!                      # Lookahead, no sequential operands
      .* 
      (?<! \S )                # WSP boundary
      (?: AND | OR )
      \s+ 
      (?: AND | OR )
      (?! \S )                 # WSP boundary
 )                        # End lookahead

 \s*                      # Optional WSP
 (                        # (1 start), Operand 1
      [+-]?                    # Optional + or -
      (?:
           \w+                      # Words
        |                         # or,
           (?:                      # Decimal number
                \d+ 
                (?: \. \d* )?
             |  \. \d+ 
           )
      )
 )                        # (1 end), Operand 1
 \s+ 
 ( AND | OR )             # (2), Operator AND / OR
 \s+ 
 (                        # (3 start), Operand 2
      [+-]?                    # Optional + or -
      (?:
           \w+                      # Words
        |                         # or,
           (?:                      # Decimal number
                \d+ 
                (?: \. \d* )?
             |  \. \d+ 
           )
      )
 )                        # (3 end), Operand 2
 \s*                      # Optional WSP
 $ 

Input test

  abc AND -xyz  

Output

 **  Grp 0 -  ( pos 0 , len 16 ) 
  abc AND -xyz  
 **  Grp 1 -  ( pos 2 , len 3 ) 
abc
 **  Grp 2 -  ( pos 6 , len 3 ) 
AND
 **  Grp 3 -  ( pos 10 , len 4 ) 
-xyz
  • @ Sln, Thanks for the Great solution. All of my test case worked except, these one ' abc AND xyz ' and ' abc AND xyz ' , when there is space in the front or back of the string, It not a big deal, we can always trim the space before running the regx. I tweaked a bit to support **+/-** in front of the word/number. Hope this is correct... ** '^((?!.*(?:AND|OR)\s+(?:AND|OR))[+-]?(\w+|(?:\d+(?:\.\d*)?|\.\d+))\s+(?:AND|OR)\s+[+-]?(\w+|(?:\d+(?:\.\d*)?|\.\d+)))$'** – Ooty Jan 29 '16 at 05:39
  • @Ooty - No problem. Use the updated one. I would be careful with how you use the long regex in html. The regex in your comments have some _invisible_ zero width characters. This won't work in a regex engine. This section `(?:AND|OR)\‌​s+[+-]?` is really this `(?:AND|OR)\U+200CU+200Bs+[+-]?` where `U+200B` = zero width space and `U+200C` = zero width non-joiner. –  Jan 29 '16 at 17:09
0

I cooked one up, hope this helps:

'^\((?!(AND|OR)\s)[^\s]+\s+(AND|OR)\s+(?!(AND|OR)\s*\))[^\s]+\)$'

Demo (expressions is a list of your test strings):

>>> def trymatch(expressions, regex):
...     for e in expressions:
...         if re.search(regex, e):
...             print('matched ' + e)
...         else:
...             print('did not match ' + e)
... 
>>> 
>>> regex = '^\((?!(AND|OR)\s)[^\s]+\s+(AND|OR)\s+(?!(AND|OR)\s*\))[^\s]+\)$'
>>> trymatch(expressions, regex)
matched (abc AND xyz)
matched (abc AND 123)
matched (abc AND 123.456)
matched (123 AND 123.456)
matched (.001 AND 1)
matched (abc OR xyz)
matched (abc OR 123)
matched (abc OR 123.456)
matched (123 OR 123.456)
matched (.001 OR 1)
did not match (AND AND AND)
did not match (AND AND abc)
did not match (123 AND AND)
did not match (OR AND OR)
did not match (AND OR OR)
timgeb
  • 76,762
  • 20
  • 123
  • 145
  • `\s*\)`->`\b` is probably less duplication. – ivan_pozdeev Jan 29 '16 at 01:33
  • I don't think the strings really start and end with parentheses. If they did, the OP's own regex wouldn't match any of them. – Alan Moore Jan 29 '16 at 01:37
  • @AlanMoore Hmm, I think OP forgot to escape his parentheses, wouldn't make much sense to have them around the whole expression. – timgeb Jan 29 '16 at 02:02
  • @timgeb, Thanks for the quick solution. But there is a slight glitch. It will match (AND.AND AND AND.AND) or (-AND AND -OR) as well. Sorry the test case was not complete. It should accept number/float as litterals. Can we improve this using something [+-]?[\w|\d*\.\d|\d]+ ??? – Ooty Jan 29 '16 at 03:15
  • @Ooty no problem, looks like you got a complete answer in the meantime – timgeb Jan 29 '16 at 10:48
-1

As per comment57872651, this is impossible with regex if you want to match nested expressions of any level: regular expressions cannot match recursive structures.

If you only want to match single expressions (a <op> b/<op> a) though, a regex is fine, and the other answer is an example.

The nested case can, however, be handled by Perl's extended patterns - which are not regular expressions in the mathematical sense but formal grammar definitions. The above link has an example of these, too.

Community
  • 1
  • 1
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152