Remove All Commas Between Quotes

Question

I'm trying to remove all commas that are inside quotes (") with python:

'please,remove all the commas between quotes,"like in here, here, here!"'
                                                          ^     ^

I tried this, but it only removes the first comma inside the quotes:

re.sub(r'(".*?),(.*?")',r'\1\2','please,remove all the commas between quotes,"like in here, here, here!"')

Output:

'please,remove all the commas between quotes,"like in here here, here!"'

How can I make it remove all the commas inside the quotes?

Do you have to use regex? Or it can be something else, like string manipulation? — gabra, Jul 12 '16 at 18:41
@gabra Anything works for me. Just as long as it gets the job done ;) — carloabelli, Jul 12 '16 at 18:42

anubhava · Accepted Answer · 2016-07-12T18:53:43.740

20

Assuming you don't have unbalanced or escaped quotes, you can use this regex based on negative lookahead:

>>> str = r'foo,bar,"foobar, barfoo, foobarfoobar"'
>>> re.sub(r'(?!(([^"]*"){2})*[^"]*$),', '', str)
'foo,bar,"foobar barfoo foobarfoobar"'

This regex will find commas if those are inside the double quotes by using a negative lookahead to assert there are NOT even number of quotes after the comma.

Note about the lookaead (?!...):

([^"]*"){2} finds a pair of quotes
(([^"]*"){2})* finds 0 or more pair of quotes
[^"]*$ makes sure we don't have any more quotes after last matched quote
So (?!...) asserts that we don't have even number of quotes ahead thus matching commas inside the quoted string only.

edited Jul 12 '16 at 18:53

answered Jul 12 '16 at 18:44

anubhava

761,203
64
569
643

1

Good news is all my quotes are balanced! Thanks! – carloabelli Jul 12 '16 at 18:45
2

it seems to work for me with mutliple comma's ... this is `re.magic` I hate regex in general ... but you sir are a genius with it – Joran Beasley Jul 12 '16 at 18:48
If you have time I'd be fascinated to know how on earth it works haha – carloabelli Jul 12 '16 at 18:49
1

Here's a [regex101](https://regex101.com/r/sF6nX0/1) with this. – Brendan Abel Jul 12 '16 at 18:53
1

@anubhava Thanks for the explanation as well. – carloabelli Jul 12 '16 at 18:55
Still will always be the wrong answer. To assume balanced quotes is ridiculous. Worse still, it takes 10 seconds to do just 55 lines of the sample. Given you are looking ahead to the end of file at every character position, it's exponentially like backtracking. The is probably the worst way to do this. – Jul 12 '16 at 20:00
1

Thank you! This is brilliant for dealing with InfluxDB's "inconsistent" quoting of values. – user2460464 Jul 07 '17 at 16:14
1

wow this works like a charm. Although i don't like regex as it is very confusing but those who like and very fond of it. Thanks for help. – user3341078 Dec 23 '18 at 18:51
1

@anubhava, Thank you so much sir, your solution really helped me. – Pyd Mar 03 '22 at 18:08

Brendan Abel · Answer 2 · 2016-07-12T19:09:39.197

3

You can pass a function as the repl argument instead of a replacement string. Just get the entire quoted string and do a simple string replace on the commas.

>>> s = 'foo,bar,"foobar, barfoo, foobarfoobar"'
>>> re.sub(r'"[^"]*"', lambda m: m.group(0).replace(',', ''), s)
'foo,bar,"foobar barfoo foobarfoobar"'

edited Jul 12 '16 at 19:09

answered Jul 12 '16 at 18:45

Brendan Abel

35,343
14
88
118

score 1 · Answer 3 · answered Jul 12 '16 at 19:52

Here is another option I came up with if you don't want to use regex.

input_str = 'please,remove all the commas between quotes,"like in here, here, here!"'

quotes = False

def noCommas(string):
    quotes = False
    output = ''
    for char in string:
        if char == '"':
            quotes = True
        if quotes == False:
            output += char
        if char != ',' and quotes == True:
            output += char
    return output

print noCommas(input_str)

score 0 · Answer 4 · answered Jul 12 '16 at 18:49

0

What about doing it with out regex?

input_str = '...'

first_slice = input_str.split('"')

second_slice = [first_slice[0]]
for slc in first_slice[1:]:
    second_slice.extend(slc.split(','))

result = ''.join(second_slice)

answered Jul 12 '16 at 18:49

Dan

1,874
1
16
21

score 0 · Answer 5 · answered Sep 25 '22 at 18:41

The above answer with for-looping through the string is very slow, if you want to apply your algorithm to a 5 MB csv file.

This seems to be reasonably fast and provides the same result as the for loop:

#!/bin/python3

data = 'hoko foko; moko soko; "aaa mo; bia"; "ee mo"; "eka koka"; "koni; masa"; "co co"; ehe mo; "bi; ko"; ko ma\n "ka ku"; "ki; ko"\n "ko;ma"; "ki ma"\n"ehe;";koko'

first_split=data.split('"')
split01=[]
split02=[]
for slc in first_split[0::2]:
    split01.append(slc)
for slc in first_split[1::2]:
    slc_new=",".join(slc.split(";"))
    split02.append(slc_new)

resultlist = [item for sublist in zip(split01, split02) for item in sublist]
if len(split01) > len (split02):
   resultlist.append(split01[-1])
if len(split01) < len (split02):
   resultlist.append(split02[-1])
   
result='"'.join(resultlist)
print(data)
print(split01)
print(split02)
print(result)

Results in:

hoko foko; moko soko; "aaa mo; bia"; "ee mo"; "eka koka"; "koni; masa"; "co co"; ehe mo; "bi; ko"; ko ma
 "ka ku"; "ki; ko"
 "ko;ma"; "ki ma"
"ehe;";koko
['hoko foko; moko soko; ', '; ', '; ', '; ', '; ', '; ehe mo; ', '; ko ma\n ', '; ', '\n ', '; ', '\n', ';koko']
['aaa mo, bia', 'ee mo', 'eka koka', 'koni, masa', 'co co', 'bi, ko', 'ka ku', 'ki, ko', 'ko,ma', 'ki ma', 'ehe,']
hoko foko; moko soko; "aaa mo, bia"; "ee mo"; "eka koka"; "koni, masa"; "co co"; ehe mo; "bi, ko"; ko ma
 "ka ku"; "ki, ko"
 "ko,ma"; "ki ma"
"ehe,";koko

Remove All Commas Between Quotes

5 Answers5

Linked

Related