1

Thank you in advance for reading.

I have a string:

A = "a levels"

I want to match all of the following possible variations of A level:

Pattern = r"a level|a levels"

(The form of this pattern is set, I cannot change it.) Following the search, I desire to get:

["a level","a levels"]

I use findall as follows:

B = re.findall(Pattern,A)

and get:

B = "a level"

re.findall only matches the first term and ignores the second overlapping term.

Per: Python regex find all overlapping matches? I tried using:

B = re.findall(Pattern,A,overlapped = True)

and get the following error:

TypeError: findall() got an unexpected keyword argument 'overlapped'

Obviously overlapped doesn't exist as a keyword argument any more...

I then looked at this question: Python regex find all overlapping matches? and tried:

C = re.finditer(Pattern,A)
results = match.group()

results = "a level"

So no better.

How can I get the output I desire?

Relevant qu: How to find overlapping matches with a regexp?

Community
  • 1
  • 1
Chuck
  • 3,664
  • 7
  • 42
  • 76
  • You may only match overlapping strings at different indices. – Wiktor Stribiżew Dec 08 '16 at 17:19
  • I am not sure if its possible to achieve what you want but the overlapped error can be resolved via `pip install regex` and then `import regex as re` regex is newer version of regex module for python. – saurabh baid Dec 08 '16 at 17:30
  • @saurabhbaid. Unfortunately, the `overlapped` option in `regex` will not resolve the problem here, as it does not work with alternation. – ekhumoro Dec 08 '16 at 17:57
  • I did not know there was a separate re and regex module. Thank you for the information. @ekhumoro Thanks for telling me the word for what I was trying to convey (seriously - I was a bit wordy without it). – Chuck Dec 08 '16 at 17:59

1 Answers1

1

If all every possible Pattern is similar to what you've shown, this might work for you:

B=[b for pat in Pattern.split('|') for b in re.findall(pat, A)]

Of course, this doesn't generalize beyond Pattern being a set of simple alternatives.

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • Thank you Rob. So it looks like you split the string by `|` and then carry out `.findall` on each of the split elements if I understand correctly? The funny thing is, I started with a version of Pattern that looked like `Pattern = ["a level", "a levels"...]` and converted it to `Pattern = "a level | a levels..."` Maybe I can get rid of that, and then implement only the `findall` part of your answer... When all is together, I will see which way is faster and pick that. Thanks for the help :) – Chuck Dec 08 '16 at 17:52