2

I want to parse configuration files like apache2.conf, which looks like this:

<Group group1>
   param1 1

   <SomeGroup group3>
      param3 3
   </SomeGroup>
</Group>

<Group group2>
   param2 2
</Group>

Regexp:

re.findall(r'\</?[^\>]+\>([\s\S]+)\<//?[^\>]+\>', text, re.MULTILINE)

if I use lazy regexp, it cuts like this:

<Group group1>
   param1 1

   <SomeGroup group3>
      param3 3
   </SomeGroup>

If I use greedy regexp, it cuts all the text. So, what is the correct way to parse it? Or is there any libraries?

artem
  • 16,382
  • 34
  • 113
  • 189
  • 4
    __Don't parse XML with regex.__ http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Katriel Jul 14 '11 at 12:51
  • 4
    @katrielalex: apache config files are not valid XML. – Mat Jul 14 '11 at 12:52
  • 1
    possible duplicate of [Any python libs for parsing apache config files?](http://stackoverflow.com/questions/237209/any-python-libs-for-parsing-apache-config-files) – Katriel Jul 14 '11 at 12:53
  • Still, using regexes on this kind of data is just not a good idea. Build a parser if one does not already exist. A regex (if even possible) will become horribly convoluted quickly. – carlpett Jul 14 '11 at 12:54
  • @katrielalex not a duplicate - the main question is how to write correct regexp. – artem Jul 14 '11 at 12:59
  • You need to find `param 1-3`? – Kirill Polishchuk Jul 14 '11 at 13:00

2 Answers2

2

Augeas has python bindings.

Flavius
  • 13,566
  • 13
  • 80
  • 126
1

There is no way to do this with regexp alone. The regexp engine has no state, so you can only parse very simple input. See here for other options: Any python libs for parsing apache config files?

Community
  • 1
  • 1
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820