2

In Notepad++, I am using Regex to replace commas between quotes in CSV file.

Using similar example from here.This is what I am trying to read.

1070,17,2,GN3-670,"COLLAR B, M STAY,","2,606.45" 

except in my text there is an extra comma right before the closing quotes.

The regex ("[^",]+),([^"]+") does not seem to pick up the last comma and result is

1070,17,2,GN3-670,"COLLAR B M STAY,","2606.45"

I would like

1070,17,2,GN3-670,"COLLAR B M STAY","2606.45"

Is there a simple Regex or will I have to use csv reader C#?

Edit: Some of the Regex is giving false matches so I would like to add another scenario. If I have

1070,17,2,GN3-670,"COLLAR B, M STAY,",55, FREE,"2,606.45"

I would like

1070,17,2,GN3-670,"COLLAR B M STAY",55, FREE,"2606.45"
Community
  • 1
  • 1
katie
  • 23
  • 5

3 Answers3

1

I think this is what you're looking for:

,(?=[^"]*"(?:[^"]*"[^"]*")*[^"]*$)

This matches any comma that's followed by an odd number of quotes. It consumes only the comma, so you replace it with nothing.

The thing about your original solution is that it would only match one comma per quoted field. It never even tried to match the second comma in "COLLAR B, M STAY,", so its position didn't really matter. This solution removes any number of commas, regardless of their position within the field.

UPDATE: This regex assumes you're processing one line at a time. If you're using it on a whole document containing many lines, the regex is probably timing out. You can work around that by excluding line terminators (carriage returns and linefeeds), like this:

,(?=[^"\r\n]*"(?:[^"\r\n]*"[^"\r\n]*")*[^"\r\n]*$)

Note that the CSV spec (such as it is) says you can have line terminators in quoted fields, so this regex is technically incorrect. If you do need to support multiline fields, you might as well switch to the CSV library. Regexes are not quite capable of handling CSV fully, but in most cases they're good enough.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • that works perfect in the tester, but am new to this. It doesnt work in notepad++, is there an easy way to translate it? – katie Sep 23 '15 at 14:46
  • Translation shouldn't be necessary; it's probably the regex itself that needs tweaking. Check my update. – Alan Moore Sep 23 '15 at 15:33
0

You can use the following to match:

((["])(?:(?=(\\?))\3.)*?),\2

And replace with the following:

\1"

See DEMO

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
0

This should work

Find What ("[^"]*),"

Replace With \1"

Anirudha
  • 32,393
  • 7
  • 68
  • 89