2

I recently learn TextFieldParser to parse words where previously I would use string.Split to do so. And I have a question regarding the newly learned class.

If we parse a message like this using string.Split with StringSplitOptions.RemoveEmptyEntries

string message = "create    myclass   \"56, 'for better or worse'\""; //have multiple spaces
string[] words = message.Split(new char[] { ' ' }, 3, StringSplitOptions.RemoveEmptyEntries);

Then we will get words which contain three elements like this:

[0] create
[1] myclass
[2] "56, 'for better or worse'"

But if we do it with TextFieldParser

string str = "create    myclass   \"56, 'for the better or worse'\"";
var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(new StringReader(str)); //treat string as I/O
parser.Delimiters = new string[] { " " };
parser.HasFieldsEnclosedInQuotes = true; 
string[] words2 = parser.ReadFields();

Then the return will consists of some words without text

[0] create
[1]
[2]
[3]
[4] myclass
[5]
[6]
[7] "56, 'for better or worse'"

Now is there equivalent way to remove the empty words in the resulting array as string.Split StringSplitOptions.RemoveEmptyEntries does?

Ian
  • 30,182
  • 19
  • 69
  • 107
  • This does not seem like a CSV. Why would you want to use the [much slower](http://stackoverflow.com/a/20456597/2316200) VB6 TextFieldParser instead of String.Split or a Regex? – Pierre-Luc Pineault Jan 06 '16 at 02:12
  • @Pierre-LucPineault I am not aware if this is normally used for (only for) CSV text, Sir. But in my previous question, two well-reputed people suggest me to use this for parsing my text and with better performance they said. Since I am new, I trusted their judgment and use this. Here was the post: http://stackoverflow.com/questions/34607051/parse-string-with-whitespace-and-quotation-mark I would be glad if you could give alternative view – Ian Jan 06 '16 at 02:15
  • @Pierre-LucPineault I opened the link which you gave and read the time difference! How can that be! See the link I give. The opinion there is pretty different. – Ian Jan 06 '16 at 02:17
  • 1
    String.Split just splits the string while the other have much more logic to respect the CSV's specifications. So for a simple job String.Split will be much faster. Initially I though you were only searching for an alternative, however your thing is more like a weird CSV but separated with n spaces. String.Split would not be able to follow all the CSV spec unless you add some logic, but a Regex would do the job pretty well and treat correctly the number of spaces. You could benchmark both, but I suspect in the end the regex would be simpler to use and as fast. – Pierre-Luc Pineault Jan 06 '16 at 02:28
  • @Pierre-LucPineault Thank you for your input, Sir. I will keep that in mind. Just one last question I have: is VB.Net based class always slower (when doing the same thing as C# based class) but ran in C# project? Are they not both .Net? If the VB.Net-based class is ran in VB project, would that make the speed faster? – Ian Jan 06 '16 at 02:39
  • 1
    In this case it was more because of the overhead than strictly being in a VB assembly. A bloated CSV parser library would have the same result. I don't have any data on VB assemblies in C# vs VB projects, so I can't say for sure. But I guess both would be converted in pretty much the same MSIL, so it wouldn't matter. The `TextFieldParser` is simply slow because it is `TextFieldParser`. DotNetPerls also did some benchmark and [agrees on that](http://www.dotnetperls.com/textfieldparser). – Pierre-Luc Pineault Jan 06 '16 at 02:56
  • @Pierre-LucPineault: the `TextFieldParser` is a pure .NET class not a VB6 class or whatsoever. It's much more powerful if you have to parse real CSV. The linked question (and it's accepted answer) is nonsense because it's comparing apples and pears. Use the proper tool, if you have CSV use a CSV-parser, if you have simple and strictly formatted text it may be enough to use `String.Split`. – Tim Schmelter Jan 06 '16 at 08:23

1 Answers1

1

May be this would do the trick

parser.HasFieldsEnclosedInQuotes = true;
string[] words2 = parser.ReadFields();
words2 = words2.Where(x => !string.IsNullOrEmpty(x)).ToArray();

One liner Alternative could be

string[] words2 = parser.ReadFields().Where(x => !string.IsNullOrEmpty(x)).ToArray();
Mohit S
  • 13,723
  • 6
  • 34
  • 69
  • Thanks for your prompt reply, Sir. But this doesn't seem to work. I get the idea which I could use `LINQ` however. – Ian Jan 06 '16 at 02:40
  • I'd remove the trim and use !string.IsNullOrWhiteSpace(tag) – Camilo Terevinto Jan 06 '16 at 02:41
  • Please remove the first implementation, which doesn't actually work. Also, you could do it all in one line: `string[] words2 = parser.ReadFields().Where(x => !string.IsNullOrEmpty(x)).ToArray();` – Camilo Terevinto Jan 06 '16 at 03:30