1

I have to be able to split readed line of code by File.ReadLines() by ';' when i got something like that in source code (two or more code lines in one line):

    string firstString = "xyzxyz"; string secodnString = "zyxzyx";

the problem is that inside those strings can be another ; or even ", and then this line:

    string firstString = "xyz;xyz\"inside quote\""; string secondString = 
    "zyx;zyx";

readed looks like this:

    "string firstString = \"xyz;xyz\\\"inside quote\\\"\"; string secondString 
    = \"zyx;zyx\";

So I figured that I can determine if ';' is inside string due to difference in \" and \\" by Regex, but i cant figure aout how to match \" but not to match \\" i've tried:

    "[^\\\\]\"" or "[^\\]\""

but it does not work. Thanks in andvace.

EDIT, my only problem is the regex, rest of it i got already writen like that:

List<string> vrlSplitedLine = vrlLines[i].Trim().Split(';').ToList();
                    List<string> vrlFinallSplitedLine = new List<string>();
                    string vrlReatachedString = string.Empty;
                    for(int j = 0; j < vrlSplitedLine.Count; j++)
                    {
                        if(Regex.Matches(vrlSplitedLine[j], "[^\\\\]\"").Count % 2 != 0)
                        {
                            vrlReatachedString = vrlSplitedLine[j];
                            int k = j;
                            do
                            {
                                k++;
                                vrlReatachedString = vrlReatachedString + ';' + vrlSplitedLine[k];
                            }
                            while (Regex.Matches(vrlSplitedLine[k], "[^\\\\]\"").Count % 2 == 0);
                            vrlFinallSplitedLine.Add(vrlReatachedString);
                            j = k;
                        }
                        else
                        {
                            vrlFinallSplitedLine.Add(vrlSplitedLine[j]);
                        }
                    }
  • 1
    Sorry, that is too problematic with a single regex. You should search for a solution based on some code parser. For C#, Roslyn might be of help. – Wiktor Stribiżew May 09 '19 at 09:11
  • 1
    Unless you put an @ symbol in front of a C# string then \ is treated as an escape character. Anything following a \ is treated as a special character, thus \" inside of a string allows you to embed double quotes, which would normally en – Steve Todd May 09 '19 at 09:22
  • Wiktor, please look at my edit, is it still too problematic? – theLegend27 May 09 '19 at 09:26
  • With regex, [] defines a set of characters, so [^\\] translates as "any character BUT \", Try looking for predefined regex strings for parsing CSVs – Steve Todd May 09 '19 at 09:28
  • 2
    Possible duplicate of [C#, regular expressions : how to parse comma-separated values, where some values might be quoted strings themselves containing commas](https://stackoverflow.com/questions/1189416/c-regular-expressions-how-to-parse-comma-separated-values-where-some-values) – Steve Todd May 09 '19 at 09:30
  • well, this post does not solve my problem – theLegend27 May 09 '19 at 09:41
  • I agree with @WiktorStribiżew. This is a problem for a parser, not a regular expression. Parsing a line of coding is fundamentally too complex for regular expressions. – Harrison McCullough May 09 '19 at 17:57
  • The problem is that you cannot easily spot if the semi-colon and backslash is in a string literal or in some code comment. You will end up writing some parser, but there is one already. – Wiktor Stribiżew May 09 '19 at 18:10

0 Answers0