0

I am writing a Regex to extract string enclosed within different type of double quotes (", , ).

So far I have written below code, it works as expected but the regex is quite big and I think it can be improved.

//mergeField = {{field(“dd/MMM/yyyy")}}
var startIndex = mergeField.IndexOf('(');
var endIndex = mergeField.IndexOf(')');
string format = mergeField.Substring(startIndex: startIndex + 1, length: endIndex - startIndex - 1);

string formatSpecifierPattern = string.Format(@"{0}|{1}|{2}|{3}|{4}|{5}", @"(\“.*?\”)|(\”.*?\”)", "\\\"(.*?)\\\"", "“(.*?)\\\"", "\\\"(.*?)“", "”(.*?)\\\"", "\\\"(.*?)”");
MatchCollection matches = Regex.Matches(format, formatSpecifierPattern);

In above code, mergeField is passed as a parameter and it can have a different combination of quotes. Is there any way I can simplify my Regex and handle all combinations (", , )

Input - Expected Output

{{field(“dd/MMM/yyyy")}} -> “dd/MMM/yyyy"
{{field("dd/MMM/yyyy“)}} -> "dd/MMM/yyyy“
{{field("dd/MMM/yyyy")}} -> "dd/MMM/yyyy"
{{field(“dd/MMM/yyyy“)}} -> “dd/MMM/yyyy“
{{field(“dd/MMM/yyyy“)}} -> “dd/MMM/yyyy“
{{field(”dd/MMM/yyyy")}} -> ”dd/MMM/yyyy"
{{field("dd/MMM/yyyy”)}} -> "dd/MMM/yyyy”
{{field(”dd/MMM/yyyy”)}} -> ”dd/MMM/yyyy”
daemonium
  • 27
  • 5
  • Do you have a specific input? Can you tell us a bit more about the why ? If you are trying to harmonize you can perhaps simply simply split on those chars? Or replace them? – Drag and Drop Feb 08 '21 at 15:42
  • Maybe `(["“”].*?["“”])` will be sufficient? – Andreas Louv Feb 08 '21 at 15:44
  • 1
    You are perhaps looking for a simple `([“"”].+?[“"”])` like https://regex101.com/r/G8EU1s/1/, in C# it will be `string pattern = @"([“""”].+?[“""”])";` – Drag and Drop Feb 08 '21 at 15:45
  • Btw., in a verbatim string (`@"..."`) double quotes are escaped by doubling them. I.e., `"` becomes `""`, not by using a backslash in C#. And you don't need to escape these different types of quotes in regex either. – Olivier Jacot-Descombes Feb 08 '21 at 15:45
  • @Drag and Drop, this is a business requirement and the end output of the `{{field(“dd/MMM/yyyy")}}` will be `08/Feb/2021` (Or whatever data comes from DB). And what do you mean by split chars / replace? Could you please explain? – daemonium Feb 08 '21 at 15:47
  • My question were about the next step. Because it look like X/Y problem, where you try to find something. Write complicate regex and down the line you change every "_bad_" into valid quote. Because the whole exercise make me think of a recent occurence of one trying to fix Html stample that had been vandalized by Outlook using regex. – Drag and Drop Feb 08 '21 at 15:52
  • @Drag and Drop, the solution which you have given in comment works as expected, could you please put it as answer so I can mark it? – daemonium Feb 08 '21 at 15:59

1 Answers1

3

If all you care about is <one kind of quote> <some text> <one kind of quote> and the kind of quote doesn't matter, then all you need to do is:

["“”](.*?)["“”]

Try it online

Explanation:

  • ["“”]: One of these characters
  • (.*?): As many of any character as possible, but lazy match
  • ["“”]: One of these characters

If the <some text> has a format you know, you could also specify that to prevent capturing some other text enclosed in quotes (if that is a possibility)

For example: ["“”]\d{1,2}/\w{3}/\d{4}["“”] will match "08/Feb/2021", but not "dd/MMM/yyyy" or "abcdefgh" Try it online

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70