15

I am trying to remove commas inside double quotes from a csv file in notepad++, this is what I have:

1070,17,2,GN3-670,"COLLAR B, M STAY","2,606.45"

and I need this:

1070,17,2,GN3-670,"COLLAR B M STAY","2606.45"

I ma trying to use notepad find/replace option with a reg exp. pattern. I tried all kind of combination but didn't manage to do :( The file contains 1 million rows.

After whole today I am not anymore sure if a simple regex can do? Maybe I should go with a script...python?

Upment
  • 1,581
  • 3
  • 15
  • 28

4 Answers4

37

mrki, this will do what you want (tested in N++):

Search: ("[^",]+),([^"]+")

Replace: $1$2 or \1\2

How does this work? The first parentheses capture the beginning of the string up to (but not including) the comma into Group 1. The second parentheses capture the end of the string after the comma into Group 2. The replacement substitutes the string with a concatenation of Group 1 and Group 2.

In more detail: in the first parentheses, we match the opening double quotes then any number of characters that are not a comma. That is the meaning of [^,]+. In the second parentheses, we match any number of characters that are not a double quote with [^"]+, then the closing double quotes .

zx81
  • 41,100
  • 9
  • 89
  • 105
  • @mrki FYI expanded the explanation, please let me know if this works for you. – zx81 Apr 21 '14 at 20:52
  • 2
    To be safer, I would use `("[^",]+),([^"]*")` because both double quote need attention **before** and **after** the comma – Mat M Apr 21 '14 at 20:52
  • Replace: $1$2 just converts "RING, R ALL (02U)" into $1$2 ?? – Upment Apr 21 '14 at 20:53
  • 1
    @mrki That is probably because you need to click the "Regular expression" checkbox at the bottom left of the Replace tab in N++. :) – zx81 Apr 21 '14 at 20:55
  • 1
    @mrki You need to [use regular expression in the search options](http://i.stack.imgur.com/xA1Hi.png) – HamZa Apr 21 '14 at 20:56
  • 1
    @HamZa nice screenshot. :) – zx81 Apr 21 '14 at 20:58
  • @HamZa yes I have that enabled since this morning...I am was after that regex all day...weird? – Upment Apr 21 '14 at 20:58
  • @mrki Yes that's weird... Are you able to link to a screenshot of your Replace box? – zx81 Apr 21 '14 at 21:00
  • @mrki Wow, that looks perfectly normal. Hey, by the way, please note a small edit in the regex in response to Mat's suggestion. I think the " in the first character class makes it more solid. Okay, a bit lost about why your N++ gives you literal replacements. Maybe a localization thing? Can you please try \1\2 in the replacement to see if that works? – zx81 Apr 21 '14 at 21:13
  • @mrki FYI edited the answer to add `\1\2` as an option for the replacement. There could be a language/locale glitch with `$1$2` – zx81 Apr 21 '14 at 21:16
  • @mrki Glad to hear it, nice talking to you. :) – zx81 Apr 21 '14 at 21:45
  • What would be the Regex just to get the comma itself? Thanks. – PeterX Feb 18 '15 at 04:45
  • This also works in java regex `myUpdatedString = myString.replaceAll("(\"[^\",]+),([^\"]*\")", "$1$2")` – desaiankitb May 18 '17 at 11:13
  • I found the marked solution missed occurrences of strings where the comma was before the final double quote (e.g. "New York,") but the ("[^",]+),([^"]*") suggestion in the comment found them all. – Doug Lerner Jun 24 '17 at 00:40
  • Nice, thanks! For multiple commas it is needed to run the search replace several times, tough. – mio Dec 18 '17 at 09:38
15

Try the following

import re

print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)

This will remove comma between quotes

Roman Marusyk
  • 23,328
  • 24
  • 73
  • 116
Nithin
  • 171
  • 1
  • 3
6

Just an update to @zx81's brilliant solution. Lets say you have 2commas between quotes

Then the regex search has to be modified as follows:

("[^",]+),([^",]+),([^"]+")

Replace needs to be modified as

$1$2$3

So on modify it depending on the # of commas.

I tried exploring to see if recursive regex was possible but the does not seem to be possible as of now

Anand Sunderraman
  • 7,900
  • 31
  • 90
  • 150
  • Old post, but could be interesting for others with a similar issue. See alpha bravo's answer here https://stackoverflow.com/questions/26664371/remove-more-than-one-comma-in-between-quotes-in-csv-file-using-regex?rq=1 – Tom Feb 07 '18 at 10:32
0

For a line with multiple instances of "comma within double quotes", I can think of the following perl script - you need to have a header line without this kind of instance so that you know how many comma-separated fields there should be.

#! /usr/bin/perl -w

use strict;

my $n_fields = "";
while (<>) {
    s/\s+$//;
    if (/^\#/) { # header line
        my @t = split(/,/);
        $n_fields = scalar(@t); # total number of fields
    } else { # actual data
        my $n_commas = $_ =~s/,/,/g; # total number of commas
        foreach my $i (0 .. $n_commas - $n_fields) { # iterate ($n_commas - $n_fields + 1) times
            s/(\"[^",]+),([^"]+\")/$1\\x2c$2/g; # single replacement per previous answers
        }
        s/\"//g; # removal of double quotes (if you want)
    }
    print "$_\n";
}
Yiran Guo
  • 9
  • 2