3

I'm trying to find a way to replace all commas in csv file that occur between two double quotes with pipes using python.

Given such input:

abc,def,"ghi,jkl,mno",pqr,stu

I would like to get:

abc,def,"ghi|jkl|mno",pqr,stu

I have tried to use positive lookarounds with something similar to:

(?<=\")(this here should match every comma)(?=\") but I can't get it to work. Any ideas?

DeepSpace
  • 78,697
  • 11
  • 109
  • 154
Łukasz Biały
  • 431
  • 5
  • 14
  • 1
    Why not read this string as is with `csv` parser and then just `replace(',', '|')`? BTW, it is not a case when you may use lookarounds, you need to match valid double quoted fields, and it might not be as easy as `"[^"]+"`. To do that properly, you should know how double quotes inside fields are escaped. – Wiktor Stribiżew May 08 '18 at 09:56

2 Answers2

5

Use re.sub with anonym or lambda function in the replacement part.

>>> import re
>>> s = 'abc,def,"ghi,jkl,mno",pqr,stu'
>>> re.sub(r'"[^"]+"', lambda x: x.group().replace(',', '|'), s)
'abc,def,"ghi|jkl|mno",pqr,stu'

Note: this won't handle escaped quotes and assume all the double quotes are properly balanced.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Could a simpler approach like this work for you:

string = 'abc,def,"ghi|jkl|mno",pqr,stu'
string_splited = string.split('"')
string_splited[1] = string_splited[1].replace(',', '|')

That would give you a list:

>>> string_splited
['abc,def,', 'ghi|jkl|mno', ',pqr,stu']

That you could reassemble

>>> '"'.join(string_splited)
'abc,def,"ghi|jkl|mno",pqr,stu'
Rafał Gajda
  • 151
  • 6