5

I'm parsing a log with python and need quick fetch some values from it

this is the simple equivalent regex and usage example

pat = re.compile("(1(2[3456]+2)+1)*")

It doesn't work as expected, only the last match group is returned by pat.match().groups()

What is the simplest solution for such problems?

updated (as wiki engine says to use edit rather than creating new post):

I need repeated matches, of course.

to_match="1232112542254211232112322421"

regex find need to be applyed twice recursively. I can bear it, but is there any options?

ayvango
  • 5,867
  • 3
  • 34
  • 73
  • 1
    Change `*` to `?`. `"(1(2[3456]+2)+1)?"` – Prince John Wesley Nov 08 '11 at 04:22
  • possible duplicate of [Python regex multiple groups](http://stackoverflow.com/questions/4963691/), [Regular expression group capture with multiple matches](http://stackoverflow.com/questions/5598340/), [Python regexes: How to access multiple matches of a group?](http://stackoverflow.com/questions/5060659/). – outis Dec 28 '11 at 02:55

2 Answers2

1

You are repeating a captured group instead of capturing a repeated group and that is the reason why you are getting only the last capture.

You should be using

pat = re.compile("((1(2[3456]+2)+1)*)")

See here for more on repeating a captured group vs capturing a repeated group http://www.regular-expressions.info/captureall.html

Narendra Yadala
  • 9,554
  • 1
  • 28
  • 43
  • 2
    It makes no sense to put brackets around the whole pattern. The match of the whole pattern is already stored in `.group(0)`. In your solution `.group(0)` and `.group(1)` are then the same. – stema Nov 08 '11 at 07:09
  • @stema You are right, but evidently OP is looking at `group(1)` from what he says in the question. There is also a possibility that the regex given here is only the part of the actual regex that is causing OP a problem and there might be something to the left/right of the given regex in which case `group(0)` might not be what is needed. – Narendra Yadala Nov 08 '11 at 07:24
1

Ok, try this (but only after you learned how to accept answers ;-) )

s = "123321124421125521"
pat = re.compile("(1(2[3456]+2)+1)")
print pat.findall(s)

remove the quantifier and use instead findall(). This will result in this list:

[('123321', '2332'), ('124421', '2442'), ('125521', '2552')]

stema
  • 90,351
  • 20
  • 107
  • 135