0

Suppose I have a string:

string = 'AvBvC'

I want to match A, B, and C, and this is what I did:

match = re.search('(.*)v(.*)', string)
print match.groups()

The problem is, the result shows that:

('AvB', 'C',)

instead of what I want, which is

('A', 'B', 'C',)

How do I make it catch all overlapping patterns..?

Thanks.

(I know there are some posts concerning the same issue, but haven't found a definite answer for Python)

hwnd
  • 69,796
  • 4
  • 95
  • 132
user2492270
  • 2,215
  • 6
  • 40
  • 56
  • Please spell out the *exact* output you want. Can't guess. – Tim Peters Dec 08 '13 at 04:01
  • @Tim Peters sorry I want ('A', 'B', 'C',) – user2492270 Dec 08 '13 at 04:01
  • @PeterDeGlopper Yeah!! just modified my post (my original code is way longer and more complicated than this,, sorry) – user2492270 Dec 08 '13 at 04:04
  • 1
    More details would help - you can do what the above question asks with just `split('v')`, so I can only assume you have a more complicated situation. – Peter DeGlopper Dec 08 '13 at 04:06
  • This may be what you are after? http://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches – OllyTheNinja Dec 08 '13 at 04:06
  • @PeterDeGlopper http://stackoverflow.com/questions/20449800/python-regex-nested-parenthesis – user2492270 Dec 08 '13 at 04:13
  • If I remember my computing theory correctly, standard regular expressions can't parse nested parens. Some languages have recursion extensions to their regexps but python does not. – Peter DeGlopper Dec 08 '13 at 04:33
  • By default, regular expressions are greedy, so they will try to match as much as possible. Hence, `.*v.*`, which matches any run of characters, will match `('AvB', 'C')`. Please read the entirety of http://docs.python.org/2/library/re.html – IceArdor Dec 08 '13 at 04:40

2 Answers2

2

Your question is somewhat unclear, you seem to have more of a complicated string than you actual show.

Using search() matches only the first occurrence, you can use findall() to match all occurrences.

matches = re.findall(r'[^v]+', string)
['A', 'B', 'C']

Another option would be to split on certain characters that you need to split on.

>>> re.split('v', 'AvBvC')
['A', 'B', 'C']
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • This works for my example, but my actual string contains things that are not single character such as A,B,C... So is there a way to do this while using "search" instead of "findall"?? – user2492270 Dec 08 '13 at 04:06
  • `search()` and `match()` return exactly 1 result. So, no. You need to make your question clearer ;-) – Tim Peters Dec 08 '13 at 04:07
  • @TimPeters Why negative lookahead doesnt allow me to match beginning of the string? – thefourtheye Dec 08 '13 at 04:09
  • @thefourtheye, huh? Without some context (or code), I don't know what you're asking - sorry. – Tim Peters Dec 08 '13 at 04:10
  • @TimPeters `print re.findall("(?<=v|^).*?(?=v|$)", myString)` I tired this and it throws `sre_constants.error: look-behind requires fixed-width pattern` – thefourtheye Dec 08 '13 at 04:11
  • @hwnd http://stackoverflow.com/questions/20449800/python-regex-nested-parenthesis – user2492270 Dec 08 '13 at 04:12
  • @thefourtheye, that's look-behind, not look-ahead ;-) The error msg explained it: all alternatives in a lookbehind must match the same number of characters. `v` matches 1 character but `^` matches 0 characters. That's all there is to it. – Tim Peters Dec 08 '13 at 04:13
  • @TimPeters Some people refer to that as negative look-ahead ;) If that is the case, why look-ahead accepts `$`? – thefourtheye Dec 08 '13 at 04:15
  • 1
    @thefourtheye, then some people are bound to get confused by sloppy terminology ;-) The "fixed width" restriction is unique to look-behinds - it does not apply to look-aheads. The reason is simply the sheer difficulty of implementing varying-width look-behinds; they're "highly unnatural" for a "left to right" search engine. – Tim Peters Dec 08 '13 at 04:17
  • @TimPeters Cool :) Thanks :) BTW, you mind [joining us](http://chat.stackoverflow.com/rooms/6/python) sometime – thefourtheye Dec 08 '13 at 04:19
2

Use re.split

>>> import re
>>> re.split('v', 'AvBvC')
['A', 'B', 'C']

And to demonstrate further...

>>> re.split('vw', 'AAvwBBvwCC')
['AA', 'BB', 'CC']
FogleBird
  • 74,300
  • 25
  • 125
  • 131