0

I'm trying to split an extremely long string by commas. I have two requirements, however:

  1. the comma cannot be followed by a space
  2. the comma cannot be followed by a '+' symbol

so for example, the input would be:

text = "hello,+how are you?,I am fine, thanks"

and the output of this is:

['hello,+how are you?', 'I am fine, thanks']

i.e. the only comma that seperated the values was the one that was not followed by a '+' or a space

I have managed requirement 1) as follows:

re.split(r',(?=[^\s]+)',text)

I cannot figure out how to add requirement 2)

Georgy
  • 12,464
  • 7
  • 65
  • 73
Callum Brown
  • 161
  • 9
  • this might be helpful https://stackoverflow.com/questions/31201690/find-word-not-followed-by-a-certain-character/31201710#31201710 – Uuuuuumm Jul 24 '20 at 14:27
  • 2
    @mk no need to escape anything: `re.split(r',(?=[^\s+])',text)` == `['hello,+how are you?', 'I am fine, thanks']` – Patrick Artner Jul 24 '20 at 14:27
  • 4
    `re.split(r',(?![+ ])', text)` <-- Negative lookahead. Matches anything containing `,` unless there is a `+` character or a space afterwards. – Hampus Larsson Jul 24 '20 at 14:28
  • 1
    @Anthony, doesn't work, result is: `['hello,+how are you?', ' am fine, thanks']` – ipj Jul 24 '20 at 14:35

3 Answers3

3

The simplest solution is to only look for the pattern that you don't want, and exclude it altogether. You do that using negative-lookahead in regular-expression.

>>> text = "hello,+how are you?,I am fine, thanks"
>>> re.split(r',(?![+ ])', text)
['hello,+how are you?', 'I am fine, thanks']

This will match , unless it's followed either by a literal + or a space.

Hampus Larsson
  • 3,050
  • 2
  • 14
  • 20
0

Try this

re.split(r',(?=[^\s +])',text)
Pratyaksh Saini
  • 73
  • 1
  • 2
  • 9
0

I suggest you go with @HampusLarsson's answer, but I'd like to squeeze in an answer that doesn't use imported modules:

s = "hello,+how are you?,I am fine, thanks"

ind = [0]+[i for i,v in enumerate(s)
           if v == ',' and s[i+1] not in [' ','+']]

parts = [s[i:j].lstrip(',')
         for i,j in zip(ind, ind[1:]+[None])]

print(parts)

Output:

['hello,+how are you?', 'I am fine, thanks']
Red
  • 26,798
  • 7
  • 36
  • 58