1

I'm wanting to split a string based on white however I know that some parts of my string will be in quotes and will have spaces in it, so I don't want it to split strings that are encapsulated in double quotes.

        if (file == null) return;
        else
        {
            using (StreamReader reader = new StreamReader(file))
            {
                string current_line = reader.ReadLine();
                string[] item;
                do
                {
                    item = Regex.Split(current_line, "\\s+");
                    current_line = reader.ReadLine();
                    echoItems(item);
                }
                while (current_line != null);

            }
        }

The split will split above will split even if it's quoted e.g "Big town" becomes in my array:

0: "big

1: town"

EDIT: after trying @vks answer, I was only able to get the IDE to accept this with all the quotes: Regex.Split(current_line, "[ ](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

Item is an array and my print method puts a "[]" around each element when printing out the array contents. This was my output:

[0  0   0   1   2   1   1   1   "Album"                 6   6   11  50  20  0   0   0   40  40  0   0   0   1   1] [] [1] [] [1] [] [1] [] [1] [] [1] [] [1] [] [1  0   0   1   3   1   1   1   "CD case"               3   3   7   20  22  0   0   0   60  0   0   0   0   1   1] [] [1] [] [1] [] [1] [] [1] [] [1] [] [1]

As you can see after splitting it is putting a large portion of the string into a single element when each of these should be broken up.

Here is a line from the file I'm trying to split:

0   0   0   1   2   1   1   1   "CD case"                   6   6   11  50  20  0   0   0   40  40  0   0   0   1   1  1  1  1  1  1  1
jn025
  • 2,755
  • 6
  • 38
  • 73

1 Answers1

5
[ ](?=(?:[^"]*"[^"]*")*[^"]*$)

Split by this.See demo.

https://regex101.com/r/sJ9gM7/56

This essentially says [ ]==capture a space.

(?=..) lookahead if it has even number of " ahead of it.i.e groups of "somehing" ahead of it.but it should not have an odd " ahead of it.

string strRegex = @"[ ](?=(?:[^""]*""[^""]*"")*[^""]*$)";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = @"asdasd asdasd asdasdsad ""asdsad sad sa d sad""     asdasd asdsad "" sadsad asd sa dasd""";

return myRegex.Split(strTargetString);
vks
  • 67,027
  • 10
  • 91
  • 124
  • Please, could you explain it a bit? – javier_domenech Apr 07 '15 at 10:58
  • The only way I was able to get the IDE to accept the regex string was like this: Regex.Split(current_line, "[ ](?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"); since there were quotes so I don't know if that affects the regex at all but my output seems different to what you are getting. I'll provide my output in my edit. – jn025 Apr 07 '15 at 11:20
  • Thanks for the update, still not working with my current text. Please see my edit for the output I'm getting. It is just read from a file and no additional formatting is done before reading it, perhaps that's my problem? I'll provide the first line of the file that's being read. – jn025 Apr 07 '15 at 11:31
  • just tried with a normal string such as the one you provided and works fine, error on my part. – jn025 Apr 07 '15 at 12:11
  • @joe yo!!!!!!!!!!!!!!!!!1 – vks Apr 07 '15 at 12:20