0

I have clauses like a(b,c(d,e(f,g),h(i))) and a string which has a number of clauses separated by a comma for example, a(b,c(d,e(f,g),h(i))),a(b,c(d,e(f,g),h(i)))

Is there a way to extract variable and function names in their hierarchical order? Suppose I want to print them as follows,

a
 b
 c
  d
  e
   f
   g
  h
   i 

how can I do this by using Python's parser easily? what regex should I use?

bfaskiplar
  • 865
  • 1
  • 7
  • 23
  • 6
    Not a regex. [Won't work for nested structures](http://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex). – Ray Toal Dec 15 '12 at 02:30
  • so should I do string manipulation? It will be pain then. – bfaskiplar Dec 15 '12 at 02:31
  • The link in the last comment has some examples in which pyparsing is used. I think they may help you. Also, [this S.O. question](http://stackoverflow.com/questions/4801403/how-can-i-use-pyparsing-to-parse-nested-expressions-that-have-mutiple-opener-clo) might also be helpful. – Ray Toal Dec 15 '12 at 02:33

3 Answers3

4

Regexes aren't good for nested structures. But the string manipulation doesn't have to be a big deal:

s = "a(b,c(d,e(f,g),h(i)))"

import re

level = 0
for tok in re.finditer(r"\w+|[()]", s):
    tok = tok.group()
    if tok == "(":
        level += 1
    elif tok == ")":
        level -= 1
    else:
        print "%s%s" % (" "*level, tok)

prints:

a
 b
 c
  d
  e
   f
   g
  h
   i
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
1
>>> s = "a(b,c(d,e(f,g),h(i))),a(b,c(d,e(f,g),h(i)))"
>>> from pyparsing import nestedExpr,Word,alphas,Literal
>>> result = nestedExpr(content=Word(alphas)).ignore(Literal(',')).parseString('('+s+')')
>>> print(results.asList())
[['a', ['b', 'c', ['d', 'e', ['f', 'g'], 'h', ['i']]], 'a', ['b', 'c', ['d', 'e', ['f', 'g'], 'h', ['i']]]]]
>>> def dump(lst,indent=''):
...   for i in lst:
...      if isinstance(i,list):
...        dump(i,indent+' ')
...      else:
...        print (indent,i)
...
>>> dump(result.asList())
  a
   b
   c
    d
    e
     f
     g
    h
     i
  a
   b
   c
    d
    e
     f
     g
    h
     i
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
0

Break the problem into 2 steps: 1. Parse the data 2. Print the data

The best way to parse your data is to find a parser that already exists. If you have a say in the format, pick one that has already been devised: don't make your own. If you don't have a say in the format and are forced to write your own parser, heed Ned's advise and don't use regex. It will only end in tears.

Once you have parsed the data, print it out with the pprint module. It excels at printing things for human consumption!