2

I have a string in the following format in a comma delimited file:

someText, "Text with, delimiter", moreText, "Text Again"

What I need to do is create a method that will look through the string, and will replace any commas inside of quoted text with a dollar sign ($).

After the method, the string will be:

someText, "Text with$ delimiter", moreText, "Text Again"

I'm not entirely good with RegEx, but would like to know how I can use regular expressions to search for a pattern (finding a comma in between quotes), and then replace that comma with the dollar sign.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
5StringRyan
  • 3,604
  • 5
  • 46
  • 69
  • 3
    This looks like CSV. Is that just a coincidence? If this is CSV, you should know that CSV is not a 'regular' language and thus cannot be completely and correctly parsed via a regular expression in all cases. See comments and answers to this question: http://stackoverflow.com/questions/1189416/c-regular-expressions-how-to-parse-comma-separated-values-where-some-values – Daniel Pratt Jul 21 '11 at 00:25
  • If this is just a hack on the way to `Split(',')`, you should certainly use a CSV parser. What would you do if the string contained a `$`, by the way (`1,2,"$5.4",6`)? – Kobi Jul 21 '11 at 04:45
  • @Daniel - Actually, valid CSV *is* a regular language (as long as you don't count all rows have the same number of unknown columns). It doesn't contain any nesting, or any context to consider. – Kobi Jul 21 '11 at 04:52

5 Answers5

3

Personally, I'd avoid regexes here - assuming that there aren't nested quote marks, this is quite simple to write up as a for-loop, which I think will be more efficient:

var inQuotes = false;
var sb = new StringBuilder(someText.Length);

for (var i = 0; i < someText.Length; ++i)
{
    if (someText[i] == '"')
    {
        inQuotes = !inQuotes;
    }

    if (inQuotes && someText[i] == ',')
    {
        sb.Append('$');
    }
    else
    {
        sb.Append(someText[i]);
    }
}
Ben
  • 6,023
  • 1
  • 25
  • 40
  • Yeah I was thinking that due to the sheer amount of possibilities regarding pattern matching are big, that I was afraid that regexs wouldn't be a possibility. However, this is a pretty good algorithm for stepping through the string itself. – 5StringRyan Jul 21 '11 at 16:59
  • @Hans Gruber - It's actually pretty easy with a regular expression. `RegEx.Replace` allows you to provide a delegate for doing the replacement once you've found the match, as shown in my answer. – Ergwun Jul 22 '11 at 00:31
1

This type of problem is where Regex fails, do this instead:

    var sb = new StringBuilder(str);

    var insideQuotes = false;

    for (var i = 0; i < sb.Length; i++)
    {
        switch (sb[i])
        {
            case '"':
                insideQuotes = !insideQuotes;
                break;
            case ',':
                if (insideQuotes)
                    sb.Replace(',', '$', i, 1);
                break;
        }               
    }

    str = sb.ToString();

You can also use a CSV parser to parse the string and write it again with replaced columns.

eulerfx
  • 36,769
  • 7
  • 61
  • 83
1

Here's how to do it with Regex.Replace:

        string output = Regex.Replace(
            input,
            "\".*?\"",
            m => m.ToString().Replace(',', '$'));

Of course, if you want to ignore escaped double quotes it gets more complicated. Especially when the escape character can itself be escaped.

Assuming the escape character is \, then when trying to match the double quotes, you'll want to match only quotation marks which are preceded by an even number of escape characters (including zero). The following pattern will do that for you:

string pattern = @"(?<=((^|[^\\])(\\\\){0,}))"".*?(?<=([^\\](\\\\){0,}))""";

A this point, you might prefer to abandon regular expressions ;)

UPDATE:

In reply to your comment, it is easy to make the operation configurable for different quotation marks, delimiters and placeholders.

        string quote = "\"";
        string delimiter = ",";
        string placeholder = "$";

        string output = Regex.Replace(
            input,
            quote + ".*?" + quote,
            m => m.ToString().Replace(delimiter, placeholder));
Ergwun
  • 12,579
  • 7
  • 56
  • 83
  • Hmm....let's say that I wanted to allow the user to specify the delimiter of the file (anything, other than a comma), and specify the quote as well. How would I change this Regex expression to be dynamic? – 5StringRyan Aug 03 '11 at 16:03
0

If you'd like to go the regex route here's what you're looking for:

var result = Regex.Replace( text, "(\"[^,]*),([^,]*\")", "$1$$$2" );

The problem with regex in this case is that it won't catch "this, has, two commas".

greg-449
  • 109,219
  • 232
  • 102
  • 145
Paul Alexander
  • 31,970
  • 14
  • 96
  • 151
  • This wont work for: someText, ""Text with, delimiter"", ""text,comma"", moreText, ""Text Again"", ""text,comma"" – eulerfx Jul 21 '11 at 00:32
-2

Can you give this a try: "[\w ],[\w ]" (double quotes included)? And be careful with the replacement because direct replacement will remove the whole string enclosed in the double quotes.

Cam L
  • 430
  • 5
  • 10