1

I am trying to find a way to detect , and or in a string even if they are repeated. So even a string such as one , , or or, two with re.split() should return "one" and "two".

So far this is what I have (Using Python 3.10):

import re

pattern = re.compile(r"(?:\s*,\s*or\s*|\s*,\s*|\s+or\s+)+", flags=re.I)
string = "one,two or three   ,   four   or   five  or , or six , oR   ,  seven, ,,or,   ,, eight or qwertyor orqwerty,"
result = re.split(pattern, string)
print(result)

which returns:

['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'qwertyor orqwerty', '']

My issue so far is if I have consecutive or, my pattern will only recognize every other or. For example:

string = "one or or two"
>>> ['one', 'or two']

string = "one or or or two"
>>> ['one', 'or', 'two']

Notice in the first example the second element contains or and in the second example or is an element by itself.

Is there a way to get around this? Also if there is a better way of separating these strings that would be greatly appreciated as well.

Bob
  • 105
  • 1
  • 9

2 Answers2

1

You can use

import re
text = "one,two or three   ,   four   or   five  or , or six , oR   ,  seven, ,,or,   ,, eight or qwertyor orqwerty,"
print( re.split(r'(?:\s*(?:,|\bor\b))+\s*', text.rstrip().rstrip(',')) )
# => ['one', 'two', 'three', 'four', 'five', 'six', 'oR', 'seven', 'eight', 'qwertyor orqwerty']

See the Python demo and the regex demo.

Details:

  • (?:\s*(?:,|\bor\b))+ - one or more repetitions of
    • \s* - zero or more whitespaces
    • (?:,|\bor\b) - either a comma or a whole word or
  • \s* - zero or more whitespaces.

Note the use of non-capturing groups, this is crucial since you are using the pattern in re.split.

Also, note the text.rstrip().rstrip(',') so that there is no trailing empty item in the result.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
-1

Does Python support the word boundary flag \b? If so, you could probably simplify the regular expression to something along the following lines:

\s*((,|\bor\b)\s*)+
Shaun Cockerill
  • 800
  • 8
  • 11