2

I have many string that I need to split by commas. Example:

myString = r'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
myString = r'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'

My desired output would be:

["test", "Test", "NEAR(this,that,DISTANCE=4)", "test again", """another test"""] #list length = 5

I can't figure out how to keep the commas between "this,that,DISTANCE" in one item. I tried this:

l = re.compile(r',').split(myString) # matches all commas
l = re.compile(r'(?<!\(),(?=\))').split(myString) # (negative lookback/lookforward) - no matches at all

Any ideas? Let's say the list of allowed "functions" is defined as:

f = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
michal111
  • 400
  • 4
  • 18
  • 1
    Possible duplicate of [How to split by commas that are not within parentheses?](https://stackoverflow.com/questions/26633452/how-to-split-by-commas-that-are-not-within-parentheses) – Austin Nov 22 '18 at 14:20

2 Answers2

2

You may use

(?:\([^()]*\)|[^,])+

See the regex demo.

The (?:\([^()]*\)|[^,])+ pattern matches one or more occurrences of any substring between parentheses with no ( and ) in them or any char other than ,.

See the Python demo:

import re
rx = r"(?:\([^()]*\)|[^,])+"
s = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
print(re.findall(rx, s))
# => ['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

If explicitly want to specify which strings count as functions, you need to build the regex dynamically. Otherwise, go with Wiktor's solution.

>>> functions = ["NEAR","FOLLOWEDBY","AND","OR","MAX"]
>>> funcs = '|'.join('{}\([^\)]+\)'.format(f) for f in functions)
>>> regex = '({})|,'.format(funcs)
>>>
>>> myString1 = 'test,Test,NEAR(this,that,DISTANCE=4),test again,"another test"'
>>> list(filter(None, re.split(regex, myString1)))
['test', 'Test', 'NEAR(this,that,DISTANCE=4)', 'test again', '"another test"']
>>> myString2 = 'test,Test,FOLLOWEDBY(this,that,DISTANCE=4),test again,"another test"'
>>> list(filter(None, re.split(regex, myString2)))
['test',
 'Test',
 'FOLLOWEDBY(this,that,DISTANCE=4)',
 'test again',
 '"another test"']
timgeb
  • 76,762
  • 20
  • 123
  • 145