4

I'm trying to find through a file expressions such as A*B.

A and B could be anything from [A-Z] [a-z] [0-9] and may include < > ( ) [ ] _ . etc. but not commas, semicolon, whitespace, newline or any other arithmetic operator (+ - \ *). These are the 8 delimiters. Also there can be spaces between A and * and B. Also the number of opening brackets need to be the same as closing brackets in A and B.

I unsuccessfully tried something like this (not taking into account operators inside A and B):

import re
fp = open("test", "r")
for line in fp:
    p = re.compile("( |,|;)(.*)[*](.*)( |,|;|\n)")
    m = p.match(line)
        if m:
            print 'Match found ',m.group()
        else:
            print 'No match'

Example 1:

(A1 * B1.list(), C * D * E) should give 3 matches:

  1. A1 * B1.list()
  2. C * D
  3. D * E

An extension to the problem statement could be that, commas, semicolon, whitespace, newline or any other arithmetic operator (+ - \ *) are allowed in A and B if inside backets:

Example 2:

(A * B.max(C * D, E)) should give 2 matches:

  1. A * B.max(C * D, E)
  2. C * D

I'm new to regular expressions and curious to find a solution to this.

OzW
  • 848
  • 1
  • 11
  • 24
ambuj
  • 121
  • 1
  • 8
  • 1
    Could you furnish some examples, please? – Wiktor Stribiżew Aug 24 '15 at 13:31
  • Use `search` ......... `match` tries to match from the begining. – Avinash Raj Aug 24 '15 at 13:31
  • You probably want to search for one or more non-separator chatacters, followed by one or more separators, followed by some non-separators again. Check out the `^`. – JimmyB Aug 24 '15 at 13:35
  • According to your current requirements, it is something like [`r'\b[^,;\s+/*-]\s*\*\s*[^,;\s+/*-]\b'`](https://regex101.com/r/jA7kP4/1), but perhaps, you need something really neater. And note you really should use `search`, or `findall`. – Wiktor Stribiżew Aug 24 '15 at 13:37
  • 2
    Regular expressions is not a good tool for this particular task. Consider creating a simple parser – Konstantin Aug 24 '15 at 13:58
  • I think this is a dupe of [`Equation parsing in Python`](http://stackoverflow.com/questions/594266/equation-parsing-in-python). Look at [this demo](http://ideone.com/y5HmFf). How deep can the nested parentheses be? – Wiktor Stribiżew Aug 24 '15 at 14:05
  • 1
    [This regex](https://regex101.com/r/jA7kP4/2) is too clumsy, but working for 1 nested level. – Wiktor Stribiżew Aug 24 '15 at 14:14

1 Answers1

1

Regular expressions have limits. The border between regular expressions and text parsing can be tight. IMO, using a parser is a more robust solution in your case.

The examples in the question suggest recursive patterns. A parser is again superior than a regex flavor in this area.

Have a look to this proposed solution: Equation parsing in Python.

Community
  • 1
  • 1
Stephan
  • 41,764
  • 65
  • 238
  • 329