5

Is there any way to combine groups and the * features of regular expressions to act kindof like a tokenizer / splitter. I tried this:

my_str = "foofoofoofoo"
pattern = "(foo)*"
result = re.search(pattern, my_str)

I was hoping my groups might look like

("foo", "foo", "foo", "foo")

But it does not. I was surprised by this because the ? and group features do work together:

my_str= "Mr foo"
pattern = "(Mr)? foo"
result = re.search(pattern, my_str)
jamylak
  • 128,818
  • 30
  • 231
  • 230
D.C.
  • 15,340
  • 19
  • 71
  • 102
  • 3
    I doubt that would work, but you can get what you want using `re.findall("foo", "foofoofoofoo")`. Oh, and please don't use `str` as a variable name. – Shawn Chin Jul 10 '12 at 08:24
  • I changed `str` to `my_str` since `str` shadows the built-in. – jamylak Jul 10 '12 at 08:28
  • ha yeah sorry, str was just an example. That code probably is not syntactically correct. Also, I did see the findall method and that would definitely work. I was just curious in a more general sense. – D.C. Jul 10 '12 at 08:28
  • 2
    @darren http://sscce.org/#co :D – jamylak Jul 10 '12 at 08:29

2 Answers2

4

The problem is you repeat your only capturing group. That means you have only one bracket ==> one capturing group, and this capturing group is overwritten each time when it matches.

See Repeating a Capturing Group vs. Capturing a Repeated Group on regular-expression.info for more information. (But capturing a repeated group is also not what you want)

So, after your regex is done, your capturing group 1 will contain the last found "foo".

This would would give you the expected result:

my_str = "foofoofoofoo"
pattern = "foo"
result = re.findall(pattern, my_str)

result is then a list ['foo', 'foo', 'foo', 'foo']

stema
  • 90,351
  • 20
  • 107
  • 135
3

Capture groups and * don't work with the built in re module -- use findall instead.

There is a library called regex in pypi that I believe supports that syntax and has a few other features such as variable length back tracking.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280