1

I'm trying to initialize a List<string> with some data from a file. The file is a list of words separated by carriage returns so currently, I am doing

var wordList = new List<string>(textFromFile.Split( new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None ) )

but for the size of text files I'm dealing with (172,888 lines in one of the files!) this is very slow. Is there a better way to do this? The text file doesn't have to be formatted the way it is currently, I could parse it and write it out in a different format if there is a better method of storing the data. In C++ I would be thinking of binary data and a memcopy but I don't think there is a similar solution in C#?

If it's relevant, the code is in a Unity app so limited to early .NET capabilities of their Mono version

Real World
  • 1,593
  • 1
  • 21
  • 46
  • Well you don´t show how you read the *file*. Your code only splits an entire string, how you obtained that string isn´t mentioned. – MakePeaceGreatAgain Nov 29 '17 at 09:44
  • 3
    You might want to use `File.ReadAllLines` method as it does exactly what you're looking for – Fabjan Nov 29 '17 at 09:45
  • 1
    One thing - if you have control over how the file is written, why not just standardize on say `\n` delimiters, then you'll only need to split on one character? – StuartLC Nov 29 '17 at 09:45
  • How slow is it? How big is the file? How slow do you need it to be (and _as fast as possible is not a valid answer_)? – mjwills Nov 29 '17 at 09:49
  • The file is currently a TextAsset in Unity, though, again, it doesn't HAVE to be. So the reading of the file is already done. The speed cost is in parsing that text to create the List. – Real World Nov 29 '17 at 10:08
  • @StuartLC good point – Real World Nov 29 '17 at 10:08
  • @RealWorld Use `TPL` or `PLinq` to split it in parallel – Fabjan Nov 29 '17 at 10:12
  • 1
    Note that if you don't really need it to be `List` (that is - don't need to add\remove items there) - then just do `ReadAllLines` and use that array, because constructing list from array will copy all stuff, which is not necessary. – Evk Nov 29 '17 at 10:20

1 Answers1

3

You might want to use File.ReadAllLines to read the file and it does exactly what you're looking for also it should be well optimized.

var wordList = File.ReadAllLines("yourFileSrc");

To improve performance even more you may want to split your files to N of files and process them in parallel using TPL (Task parallel library) or use .AsParallel method (as kindly suggested by Evk)

More info about PLINQ you can find here

*** Update

For parsing a large string you might want to split the string first (without parsing it) to a number of lesser strings and then process them in parallel.

Fabjan
  • 13,506
  • 4
  • 25
  • 52
  • 2
    You can process them in parallel even without splitting file, with `File.ReadLines(file).AsParallel()` for example (depending on what "process" means exactly in this case). – Evk Nov 29 '17 at 09:56
  • Yep, thanks, updated. – Fabjan Nov 29 '17 at 10:04
  • I'm not sure I can use the TPL stuff in Unity. It doesn't support Threading on some of the platforms we're targetting. (Sorry, didn't mention Unity in the original question) – Real World Nov 29 '17 at 10:20