2

I got the problem of reading single line form large file encoded in UTF-8. Lines in the file has the constant length.

The file in average has 300k lines. The time is the main constraint, so I want to do it the fastest way possible.

I've tried LinQ

    File.ReadLines("file.txt").Skip(noOfLines).Take(1).First();

But the time is not satisfactory enough.

My biggest hope was using the stream, and setting its possition to the desired line start, but the problem is that lines sizes in bytes differ.

Any ideas, how to do it?

Sudet
  • 98
  • 1
  • 11
  • 1
    If each line is a fixed length you can simply calculate the offset and read from there to the line length with a FileStream, E.g. http://stackoverflow.com/a/8678918/246342 – Alex K. Dec 09 '16 at 12:23
  • @AlexK.: "using the stream, and setting its possition to the desired line start, but the problem is that lines sizes in bytes differ" – Tim Schmelter Dec 09 '16 at 12:23
  • You could simplify your query: `File.ReadLines("file.txt").ElementAtOrDefault(noOfLines+1)` – Tim Schmelter Dec 09 '16 at 12:24
  • @Evk: He's using `ReadLines` not `ReadAllLines`, that's doing the same – Tim Schmelter Dec 09 '16 at 12:26
  • 1
    Is that file static or there are many files like this and they change often? How often do you need to get specific line from given file? – Evk Dec 09 '16 at 12:30
  • Files are provided by the external system and I need to read the set of lines only once for each file I get (number of lines that I need to read is drastically smaller than the nuber of lines in the file). – Sudet Dec 09 '16 at 12:36
  • Possible duplicate of [How do I read a specified line in a text file?](http://stackoverflow.com/questions/1262965/how-do-i-read-a-specified-line-in-a-text-file) – Chris Schmitz Dec 09 '16 at 12:48
  • Then at least you can read all lines you need in one go. So if you need say lines 3,5,10,12 you don't have to read file 4 times (which you will do with solution in your question). – Evk Dec 09 '16 at 12:51
  • @Dr.Coconut I have read this post, but the problem is time. Proposed solution is not fast enough. – Sudet Dec 09 '16 at 12:51
  • Read the file in chunks until you got all your lines. There is no way the system could guess where the line breaks are. Unless you can index them beforehand. Hence the question whether they are static.. – TaW Dec 09 '16 at 12:56
  • @Evk I have this sorted out, but I was wondering if there's the fastest way of reading the single line. – Sudet Dec 09 '16 at 12:57
  • Btw: `StreamReader` is the fastest – Jim Dec 09 '16 at 13:21
  • @Jim: Yeah I know, but how to use StreamReader to read single line from file, when file is encoded in UTF-8 and file is to big to read it line by line. – Sudet Dec 09 '16 at 13:34
  • 1
    @Sudet http://stackoverflow.com/questions/8037070/whats-the-fastest-way-to-read-a-text-file-line-by-line – Jim Dec 09 '16 at 13:38
  • 1
    @Sudet http://stackoverflow.com/questions/2161895/reading-large-text-files-with-streams-in-c-sharp – Jim Dec 09 '16 at 13:39
  • If it is too slow, then you have an I/O problem. Take a faster hard drive (e.g. SSD over PCIe). – Oliver Dec 09 '16 at 15:24

1 Answers1

1

Now this is where you don't want to use linq (-: You actually want to find a nth occurrence of a new line in the file and read something till the next new line.

You probably want to check out this documentation on memory mapped files as well: https://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile(v=vs.110).aspx

There is also a post comparing different access methods http://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files

Alexander Taran
  • 6,655
  • 2
  • 39
  • 60