-3

I am trying to split a string using a regular expression (re.split), but it's been a while since I've been using regular expressions.

The string looks like:

string = '"first, element", second element, third element, "fourth, element", fifth element'

I would like to split the string on each comma unless a substring is enclose by quotes.

The output should look like this:

output = ['"first, element"', 'second element', 'third element', '"fourth, element"', 'fifth element']
Jerry
  • 70,495
  • 13
  • 100
  • 144
JustMe
  • 237
  • 4
  • 7
  • Any attempts so far? Also, [related](http://stackoverflow.com/q/21261314/1578604). IMO, it's easier to use match though. – Jerry Aug 26 '14 at 08:49
  • Regex is the wrong way to approach this problem. Quoted strings can have escape characters, you should use `shlex.split` instead. – simonzack Aug 26 '14 at 08:53
  • Or parse it as csv using the csv module - because that's what it really is. – slebetman Aug 26 '14 at 08:54
  • 1
    See also [How do I split a line by commas but ignore commas within quotes?](http://stackoverflow.com/q/7682561), [How do I split a comma delimited string in python except for the commas that are within quotes?](http://stackoverflow.com/q/4982531), [How to split but ignore separators in quoted strings?](http://stackoverflow.com/q/2785755), [and more](https://www.google.com/search?ie=utf8&oe=utf8&q=python+split+commas+except+inside+quotes&nfpr=1&gws_rd=ssl). – jscs Aug 26 '14 at 08:55

2 Answers2

5

You want to use the csv module instead of reinventing it.

aecolley
  • 1,973
  • 11
  • 10
  • 3
    I was trying to create an example using `csv.reader`, but no luck so far. It looks like it can't cope with some items being wrapped in quotes, and some not. – Bjorn Aug 26 '14 at 08:57
  • 1
    thanks @Bjorn, I was able to parse the line using the csv module – JustMe Aug 26 '14 at 09:10
  • @Bjorn - perhaps something else is going wrong, because the `csv` module most certainly can parse lines where some fields have the delimiter contained within quotes, whereas other lines have no quotes whatsoever. – dwanderson Sep 06 '16 at 19:28
  • didn't work for combination of commas, colon and escaping: 'categoryFilter: ["5"],priceRanges: "[{\\"min\\":0,\\"max\\":667}]"' – Nir O. Sep 01 '21 at 00:39
3

You could try the below code,

>>> import re
>>> string = '"first, element", second element, third element, "fourth, element", fifth element'
>>> m = re.split(r', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))', string)
>>> m
['"first, element"', 'second element', 'third element, "fourth, element"', 'fifth element']

Regex stolen from here :-)

Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • I have marked this as the solution as it is technically the answer to my question although I agree that the csv module is the way to go here – JustMe Aug 26 '14 at 09:09