0

I have text file which contain 200000 rows. I want to read first 50000 rows and then process it and then read second part say 50001 to 100000 etc. When I read second block I don't write to loop on first 1 to 50000. I want that reader pointer directly goes to row number 50001 and start reading.

How it can be possible? Which reader is used for that?

Roman Starkov
  • 59,298
  • 38
  • 251
  • 324
Mahesh Chitroda
  • 178
  • 2
  • 10
  • 1
    [StringReader](http://msdn.microsoft.com/ru-ru/library/system.io.stringreader.aspx) can read file line-by-line. The easiest way is to just don't close it between blocks. – Tommi Jul 12 '13 at 10:31
  • try MemoryMappedFile which is a designed class for this scenario. http://stackoverflow.com/questions/4273699/how-to-read-a-large-1-gb-txt-file-in-net?lq=1 – ValidfroM Jul 12 '13 at 11:13

4 Answers4

1

You need the StreamReader class.

With this you can do line by line reading with the ReadLine() method. You will need to keep track of the line count yourself and call a method to process your data every 50000 lines, but so long as you keep the reader open you should not need to restart the reading.

Steve
  • 8,469
  • 1
  • 26
  • 37
1

No unfortunately there is no way you can skip counting the Lines. At the raw level files do not work on a line number basis. Instead they work at a position / offset basis. The root file system has no concept of lines. It's a concept added by higher level components.

So there is no way to tell the operating system, please open file at line specified. Instead you have to open the file and skip around counting new lines until you've passed the specified number. Then store the next set of bytes into an array until you hit the next new line.

Though If each line has equal number of bytes present then you can try the following.

using( Stream stream = File.Open(fileName, FileMode.Open) )
{
    stream.Seek(bytesPerLine * (myLine - 1), SeekOrigin.Begin);
    using( StreamReader reader = new StreamReader(stream) )
    {
        string line = reader.ReadLine();
    }
}
Rakesh
  • 310
  • 3
  • 19
1

I believe the best way would be to use stream reader,

Here are two related questions to yours, in which you can get answers from there. But ultimately if you want to get blocks of text it is very hard to do unless it is a set amount.

However I believe these would be a good read for you to use:

This one shows you how to separate blocks of code to read. The answer for this one would be best suited, you can just set the conditions to read how many lines you have read, and set the conditions to check if the line count == 50000 or so on then do something.

As you can see

This answer makes use of the keyword continue which I believe will be useful for what you are intending to do.

This one shows you a more readable answer but doesn't really answer what you are looking for in reading blocks.

For your question I believe that what you want to do has confused you a little, it seems like you want to highlight 50000 lines and then read it as one, that is not the way streamreader works, and yes reading line by line makes the process longer but unfortunately that's the case.

Community
  • 1
  • 1
Philip Gullick
  • 995
  • 7
  • 22
0

Unless the rows are exactly the same length, you can't start directly at row 50001.

What you can do, however, is when reading the first 50000 rows, remember where the last row ends. You can then seek directly to that offset and continue reading from there.

Where the row length is fixed, you do something like this:

myfile.Seek(50000 * (rowCharacters + 2), SeekOrigin.Begin);

Seek goes to a specific offset in bytes, so you just need to tell it how many bytes 50000 rows occupy. Given an ASCII encoding, that's the number of characters in the line, plus 2 for the newline sequence.

Roman Starkov
  • 59,298
  • 38
  • 251
  • 324
  • Yes row length is same.How can I seek to position 50001? I have a counter which keep track the last read row – Mahesh Chitroda Jul 12 '13 at 10:33
  • @MaheshChitroda I've provided a direct answer to what you asked, however I should point out that I wouldn't do it like this. Steve's answer is a better approach. – Roman Starkov Jul 12 '13 at 10:37