7

If I have a string like this

create myclass "56, 'for the better or worse', 54.781"

How can I parse it such that the result would be three string "words" which have the following content:

[0] create
[1] myclass
[2] "56, 'for the better or worse', 54.781"

Edit 2: note that the quotation marks are to be retained

At first, I attempted by using string.Split(' '), but I noticed that it would make the third string broken to few other strings.

I try to limit the Split result by using its count argument as 3 to solve this. And is it ok for this case, but when the given string is

create myclass false "56, 'for the better or worse', 54.781" //or
create myclass "56, 'for the better or worse', 54.781" false

Then the Split fails because the last two words will be combined.

I also created something like ReadInBetweenSameDepth to get the string in between the quotation mark

Here is my ReadInBetweenSameDepth method

//Examples:
    //[1] (2 + 1) * (5 + 6) will return 2 + 1
    //[2] (2 * (5 + 6) + 1) will return 2 * (5 + 6) + 1
public static string ReadInBetweenSameDepth(string str, char delimiterStart, char delimiterEnd) {
  if (delimiterStart == delimiterEnd || string.IsNullOrWhiteSpace(str) || str.Length <= 2)
    return null;
  int delimiterStartFound = 0;
  int delimiterEndFound = 0;
  int posStart = -1;
  for (int i = 0; i < str.Length; ++i) {
    if (str[i] == delimiterStart) {
      if (i >= str.Length - 2) //delimiter start is found in any of the last two characters
        return null; //it means, there isn't anything in between the two
      if (delimiterStartFound == 0) //first time
        posStart = i + 1; //assign the starting position only the first time...
      delimiterStartFound++; //increase the number of delimiter start count to get the same depth
    }
    if (str[i] == delimiterEnd) {
      delimiterEndFound++;
      if (delimiterStartFound == delimiterEndFound && i - posStart > 0)
        return str.Substring(posStart, i - posStart); //only successful if both delimiters are found in the same depth
    }
  }
  return null;
}

But though this function is working, I found it pretty hard to combine the result with the string.Split to make the correct parsing as I want.

Edit 2: In my poor solution, I need to re-add the quotation marks later on

Is there any better way to do this? If we use Regex, how do we do this?

Edit:

I honestly am unaware that this problem could be solved the same way as the CSV formatted text. Neither did I know that this problem is not necessarily solved by Regex (thus I labelled it as such). My sincere apology to those who see this as duplicate post.

Edit 2:

After working more on my project, I realized that there was something wrong with my question (that is, I did not include quotation mark) - My apology to the previously best answerer, Mr. Tim Schmelter. And then after looking at the dupe-link, I noticed that it doesn't provide the answer for this either.

Ian
  • 30,182
  • 19
  • 69
  • 107

3 Answers3

3

You can split by this

\s(?=(?:[^"]*"[^"]*")*[^"]*$)

See demo.

https://regex101.com/r/fM9lY3/60

string strRegex = @"\s(?=(?:[^""]*""[^""]*"")*[^""]*$)";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = @"create myclass ""56, 'for the better or worse', 54.781""";

return myRegex.Split(strTargetString);
vks
  • 67,027
  • 10
  • 91
  • 124
  • Thanks, I think this is the best answer since I am using C# for the task. I honestly am not aware that my problem is the same as CSV parsing though. – Ian Jan 05 '16 at 08:37
  • 1
    Oh, come on, it is the worst answer here! **Do not use this regex if you can do without it!** Look at how much backtracking it involves. I'd rather choose an answer with more explanation. – Wiktor Stribiżew Jan 05 '16 at 08:45
  • @stribizhev do you have better answer, sir? Since my own method is obviously worse than all the given answers. – Ian Jan 05 '16 at 08:46
  • The answer is in the dupe question, or here it is: `public static string[] parse(string csv, string separator) { TextFieldParser parser = new TextFieldParser(new StringReader(csv)); parser.HasFieldsEnclosedInQuotes = true; parser.SetDelimiters(separator); string[] fields = null; while (!parser.EndOfData) fields = parser.ReadFields(); parser.Close(); return fields; }`. Set space as separator, that is all, and it is safe. Add `using Microsoft.VisualBasic.FileIO;` and `System.IO;` – Wiktor Stribiżew Jan 05 '16 at 08:47
  • @stribizhev so you also think that the `TextFieldParser` is best method as much as Mr. TimSchmelter. Thanks, I will check it out. – Ian Jan 05 '16 at 08:48
  • Actually, Tushar gave a more expanded explanation of his regex :) – Wiktor Stribiżew Jan 05 '16 at 19:49
2

Regex Demo

(\w+|"[^"]*")

Get the matches in the first capture group.

  1. \w+: Matches alphanumeric characters and underscore one or more times
  2. "[^"]*": Matches anything that is wrapped in double quotes
  3. |: OR condition in regex
Tushar
  • 85,780
  • 21
  • 159
  • 179
  • Thanks, I tested the `Regex` and it worked well. Appreciate the explanation too. – Ian Jan 05 '16 at 08:36
  • thanks Mr. Tushar, after further working with the data, it seems like yours is the best solution, with explanation somemore. Here is my other post which makes me think so: http://stackoverflow.com/questions/34624536/stringsplitoptions-removeemptyentries-equivalent-for-textfieldparser – Ian Jan 06 '16 at 03:04
1

I would use a real csv-parser for this task. The only one available in the framework is the TextFieldParser-class in the VisualBasic namespace:

string str = "create myclass \"56, 'for the better or worse', 54.781\"";
var allLineFields = new List<string[]>();
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(new StringReader(str)))
{
    parser.Delimiters = new string[] { " " };
    parser.HasFieldsEnclosedInQuotes = true;  // important
    string[] lineFields;
    while ((lineFields = parser.ReadFields()) != null)
    {
        allLineFields.Add(lineFields);
    }
}

Result:

enter image description here

But there are others available like this or this.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Thanks, I did not know that there is `TextFieldParser` in VB library which can be used like that. I appreciate your input. +10 – Ian Jan 05 '16 at 08:34
  • @Ian: You can use it with C# without a problem. It more efficient than using regex if you're actually parsing a larger text. You're welcome – Tim Schmelter Jan 05 '16 at 08:38
  • O, I see... my bad. Obviously, since it is converted to `dll` then it should be `.Net` class rather than `VB` that it can be used pretty easily in C# too. Thanks for the correction. I would as well see its performance. – Ian Jan 05 '16 at 08:43
  • @TimSchmelter: You should have closed this question as a dupe long before me. You answered such questions a 1 mln times. – Wiktor Stribiżew Jan 05 '16 at 08:57