1

I have a CSV file I read in and I convert it to a TXT file by writing out the values comma seperated per column. I want the program to also be able to convert the TXT file back to a CSV so I'm creating a TXTReader class. I'm having trouble reading in big TXT files. I first tried it using String.Split :

string fullText = File.ReadAllText(fileName);
string[] values = fullText.Split(',');

This worked at first but started causing problems when columns with strings that had commas in them showed up, making the program think it was another column while it was just a string. I went on to find a solution and found https://stackoverflow.com/a/3147901/1870760. This works perfectly with small files but is really slow with my 31 MB TXT files. I then tried my own hacky way by iterating over all the characters in fullText and checking for "\"" because all of the strings have quotes wrapped around them in the TXT but this also took a long time (~10 minutes). I also can't use https://stackoverflow.com/a/3148691/1870760 because my string column values sometimes contain \n which causes the reader to think it's a new row, which it's not.

So, do I have to just accept it'll take a while to read a 31 MB TXT file and splitting the values into columns or are there more performance effective ways to do this?

Community
  • 1
  • 1
Hatted Rooster
  • 35,759
  • 6
  • 62
  • 122
  • A simple thing you can do to speed things up is use RegexOptions.Compiled. Other things include, get a faster regex query(probably not really possible), use threading. – Vajura Aug 03 '15 at 10:52
  • 2
    `TextFieldParser` deals with all of that - [Parse comma seperated string with a complication in C#](http://stackoverflow.com/questions/30078054/parse-comma-seperated-string-with-a-complication-in-c-sharp) – Alex K. Aug 03 '15 at 10:52
  • @AlexK. As noted in the question, I can't use `TextFieldParser` because it reads the values per newline, and my strings can contain `\n`. – Hatted Rooster Aug 03 '15 at 10:53
  • So swap \n with another char, parse, swap back? – Alex K. Aug 03 '15 at 10:53
  • @AlexK. hmm, that sounds like a solution.. but that would mean I'd need to swap it with a character that's not used anywhere else in any string, what ASCII value would that be? – Hatted Rooster Aug 03 '15 at 10:56
  • Maybe you should run your code through a profiler, 10 minutes to iterate through a 31MB file sounds **really** excessive – Kevin Gosse Aug 03 '15 at 10:59
  • It does not have to be a character, replace \n (not preceeded with \r; `(?<!\r)\n`) with a string `"{WHATEVERTOKEN}"` – Alex K. Aug 03 '15 at 11:11
  • even though you say you can't use parts from [an old answer](http://stackoverflow.com/questions/3147836/c-sharp-regex-split-commas-outside-quotes/3148691#3148691) I would still suggest you have a look at the codeproject link I **[gave](http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader)** - this one should be able to handle multi-line records – macf00bar Aug 03 '15 at 11:19

1 Answers1

0

There's a project that is said to be 15 times faster for csv reading / splitting than regex with low memory usage. Even Data Binding is supported if you want to display the data later. Sources available.

You can customize many parameters (including line breaking options) so I assume it wil be intelligent enough to handle your \n in values, it defintely handles the commas in parameter values.

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

Marc Wittmann
  • 2,286
  • 2
  • 28
  • 41