0

Suppose I want to escape all double-quotes which are nested within double quotes (picture a CSV or something):

"Jim", "Smythe", "Favorite Quote: "This is my favorite quote.""

I'd like to isolate the inner quotes which surround This is my favorite quote., and then escape them with a \. But I'm having trouble writing a regex to just match on the inner quotes. So, the resulting match I'd like is:

"Jim", "Smythe", "Favorite Quote: "This is my favorite quote.""
                                  ^^                        ^^
                 Start Match Here ||                        || End Match Here
                Start Capture Here |       End Capture Here |

Match:   "This is my favorite quote."
Capture:  This is my favorite quote.

And then I can easily escape the quotes with the pattern \"$1\" to get the end result:

"Jim", "Smythe", "Favorite Quote: \"This is my favorite quote.\""
Josh M.
  • 26,437
  • 24
  • 119
  • 200
  • 1
    Generally you should escape the value before putting it in the line. If you do it afterwards you can only handle the cases where you can determine that it's a quote and not the ending of the value. For example if the quote is a comma, then you can't distinguish it from a value separator. – Guffa Sep 25 '13 at 18:41
  • My first guess would be that this is not possible due to reasons specified in this question - http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns – dana Sep 25 '13 at 18:41
  • Something like this: `/"(?:[^\\"]|\\.)*"/`? – Brian Sep 25 '13 at 18:42
  • @Jerry I haven't get any reasonable attempts, not sure it can be done (at least not reliably). – Josh M. Sep 25 '13 at 18:45
  • Unless there is some definable format to the outer string (like CSV) that can't appear in the inner string (can't an escaped comma appear in the inner string?), this is impossible as you can't differentiate between `"` and `"`. – Bernhard Barker Sep 25 '13 at 18:48
  • @JoshM. Can you also have commas inside the elements? Such as: `"Jim", "Smythe", "Favorite Quote: "This is my favorite quote, of course.""` – Jerry Sep 25 '13 at 18:56
  • A better question is if the quote can start with a comma, like this `"Jim", "Smythe", "Favorite Quote: ", "The comma and this are part of the quote", and so is this"`. – Dour High Arch Sep 25 '13 at 19:31
  • @DourHighArch In that instance, the list is broken beyond repair IMO xD – Jerry Sep 25 '13 at 19:59
  • Thanks for the comments, Jerry's answer seems to get me close enough. I don't expect it to work on all cases but it is a good start and allows me to manually edit less garbage. – Josh M. Sep 25 '13 at 20:00

2 Answers2

2

I suggest:

(?<!^|, )"(?=(?:(?<!"),|[^,])*"(?:,|$))

Replace with \\$0

regex101 demo

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • You use those with `@` in front by the way so that you don't have to use too many escapes, i.e. `@"(?<!^|, )"(?=(?:(?<!"),|[^,])*"(?:,|$))"` and `@"\\$0"` – Jerry Sep 25 '13 at 20:00
  • in fact I tested this and it's a little `failed` the `"` after `Jim` is matched. I tested with `RegexBuddy`. – King King Sep 25 '13 at 20:05
  • @KingKing, I tested it with Expresso (based on C# regex engine) and it works for the example I gave. – Josh M. Sep 25 '13 at 20:14
  • @JoshM. You should notice the **flavor**, looks like the `regex101 demo` is for `PHP flavor`, while it won't work for `.NET flavor`. See this screen shot https://sites.google.com/site/thecabinet3/home/files-store/testReg.png?attredirects=0 – King King Sep 25 '13 at 20:20
  • @KingKing Where did that space coming after the first quote come from? – Jerry Sep 26 '13 at 04:58
1

This works for me:

string input = "\"Jim\" , \"Smythe\", \"Favorite Quote: \"This is my favorite quote.\"\"";
var output = Regex.Match(input,"\"(?!\\s*,\\s*\")((?<!(,|^)\\s*\"\\w*?)[^\"]+)\"").Groups[1].Value;
//output = This is my favorite quote.

var replacedOutput = Regex.Replace(input, "\"(?!\\s*,\\s*\")((?<!(,|^)\\s*\"\\w*?)[^\"]+)\"", "\\\"$1\\\"");
King King
  • 61,710
  • 16
  • 105
  • 130