0

I have a .csv file build up like this:

ABCDE;12345;null;{myname:Sam;};somecontent
XYZ;69;null;{other ; value };someothercontent

The delimiter of the csv is a semicolon, but it may happen that there is an unwanted semicolon in the text that is between the curly braces. The unwanted semicolon can be present in different contexts, it can be a typo by input, a piece of inline css, a part of html,…

The good news is that the unwanted semicolons only are present in the content between the curly braces. So the solution I’m searching for should be something like a Regex that executes “remove all semicolons that are between curly braces”. I'm working in .NET

Can anyone help me with setting up this pattern?

MXM_Sam
  • 57
  • 4

2 Answers2

5

Replace all occurrences of this regex with the empty String:

;(?=[^{}]*})

This matches a semicolon but only when followed by 0 or more characters that are not { or } then a }, which is the same as saying “only when the next curly brace character is a close curly brace”

As code:

line = Regex.Replace(line, ";(?=[^{}]*})", "");
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Thanks, this is already very usefull! I ran it over my production data, but discovered now that there are pairs of curly braces inside curly braces. Like this: ABCDE;12345;null;{myname:Sam;{myothername:Sammy}};somecontent. I think I have to add a lookbehind to handle this? Didn't find the right solution yet – MXM_Sam Feb 05 '21 at 13:53
  • This should do it: (?<={[^{}]*);(?=[^{}]*[{}]) – MXM_Sam Feb 05 '21 at 14:36
  • @MXM_Sam is the maximum nesting depth 1, like your example? Or can the nesting go deeper? – Bohemian Feb 05 '21 at 18:39
  • nesting can go 4 levels deep – MXM_Sam Feb 07 '21 at 12:50
1

Maybe this can help:

string text = "ABCDE;12345;null;{myname:Sam;};somecontent" + Environment.NewLine +
              "XYZ;69;null;{other ; value };someothercontent";
string pattern = @"(?<={[^}{\n]*);(?=[^}{\n]*})";
string replaced = Regex.Replace(text, pattern, "");
Console.WriteLine(replaced);
  • (?<={[^}{\n]*) positive lookbehind, there should be a opening curly brace on the same line
  • ; the semicolon
  • (?=[^}{\n]*}) positive lookahead, there should be a closing curly brace on the same line
alex-dl
  • 802
  • 1
  • 5
  • 12