regular expression for a string format

Question

I have a string as

(device
    (vfb
        (xxxxxxxx)
        (xxxxxxxx)
        (location 0.0.0.0:5900)
    )
)

(device
    (console
        (xxxxxxxx)
        (xxxxxxxx)
        (location 80)
    )
)

I need to read the location line from "vfb" portion of the string. I have tried to use regular expression like

  import re
  re.findall(r'device.*?\vfb.*?\(.*?(.*?).*(.*?\))

But it doesn't give me the required output.

http://stackoverflow.com/questions/3182594/parsing-s-expressions-in-python — Aditya Mukherji, Dec 26 '12 at 11:02
possible duplicate of [read a multiline string in python](http://stackoverflow.com/questions/14037183/read-a-multiline-string-in-python) — Lev Levitsky, Dec 26 '12 at 11:53

score 3 · Accepted Answer · answered Dec 26 '12 at 12:30

It's better to use a parser for problems like this. Fortunately, a parser would be rather trivial in your case:

def parse(source):

    def expr(tokens):
        t = tokens.pop(0)
        if t != '(':
            return {'value': t}
        key, val = tokens.pop(0), {}
        while tokens[0] != ')':
            val.update(expr(tokens))
        tokens.pop(0)
        return {key:val}

    tokens = re.findall(r'\(|\)|[^\s()]+', source)
    lst = []
    while tokens:
        lst.append(expr(tokens))
    return lst

Given the above snippet, this creates a structure like:

[{'device': {'vfb': {'location': {'value': '0.0.0.0:5900'}, 'xxxxxxxx': {}}}},
 {'device': {'console': {'location': {'value': '80'}, 'xxxxxxxx': {}}}}]

Now you can iterate it and fetch whatever you need:

for item in parse(source):
    try:
        location = item['device']['vfb']['location']['value']
    except KeyError:
        pass

score 3 · Answer 2 · answered Dec 26 '12 at 13:02

With that intro from Martijn Pieters, here is a pyparsing approach:

inputdata = """(device
    (vfb
        (xxxxxxxx)
        (xxxxxxxx)
        (location 0.0.0.0:5900)
    )
)

(device
    (console
        (xxxxxxxx)
        (xxxxxxxx)
        (location 80)
    )
)"""

from pyparsing import OneOrMore, nestedExpr

# a nestedExpr defaults to reading space-separated words within nested parentheses
data = OneOrMore(nestedExpr()).parseString(inputdata)

print (data.asList())

# recursive search to walk parsed data to find desired entry
def findPath(seq, path):
    for s in seq:
        if s[0] == path[0]:
            if len(path) == 1:
                return s[1]
            else:
                ret = findPath(s[1:], path[1:])
                if ret is not None:
                    return ret
    return None
print findPath(data, "device/vfb/location".split('/'))

prints:

[['device', ['vfb', ['xxxxxxxx'], ['xxxxxxxx'], ['location', '0.0.0.0:5900']]], 
 ['device', ['console', ['xxxxxxxx'], ['xxxxxxxx'], ['location', '80']]]]
0.0.0.0:5900

Nice - here's a tip, I have always thought it clunky to have to define a string with all the characters in printables except for 1 or 2 special punctuation or delimiters. So now instead of having to create your `no_parens` variable, just for the sake of creating a Word of any printable except for left and right parens, you can do `Word(printables,excludeChars='()')`. — PaulMcG, Dec 26 '12 at 16:14
thanks, I've updated [the gist](http://nbviewer.ipython.org/4380882/). I had pyparsing 1.5.2 version where there is no `excludeChars`. — jfs, Dec 26 '12 at 16:44

pemistahl · Answer 3 · 2012-12-26T11:48:39.343

0

Maybe this gets you started:

In [84]: data = '(device(vfb(xxxxxxxx)(xxxxxxxx)(location 0.0.0.0:5900)))'

In [85]: m = re.search(r"""
  .....:     vfb
  .....:     .*
  .....:     \(
  .....:         location
  .....:         \s+
  .....:         (
  .....:             [^\)]+
  .....:         )
  .....:     \)""", data, flags=re.X)

In [86]: m.group(1)
Out[86]: '0.0.0.0:5900'

edited Dec 26 '12 at 11:48

answered Dec 26 '12 at 11:21

pemistahl

9,304
8
45
75

regular expression for a string format

3 Answers3

Linked