3

I need to process some files (log files mainly) and I got to use regex in each line. I use

using (FileStream fs = File.Open("logs.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
    string line;
    while ((line = sr.ReadLine()) != null)
    {
       // use regex and add create new file..
    }
}

But in the logs I have "name changes" as well, so if someone changed their name I need to process the logs again, with an other name this time. Can I make it start reading from a certain line so it can save me some time doing this?

user1546616
  • 43
  • 1
  • 5
  • string line -> use a stringbuilder instead. string line = File.ReadLines(FileName).Skip(14).Take(1).First(); –  Jun 27 '17 at 17:05
  • @cutzero thanks I will try it – user1546616 Jun 27 '17 at 17:06
  • you could remember the offset where to pick up in the next pass, and use the [Seek function](https://stackoverflow.com/a/5404324/1132334) – Cee McSharpface Jun 27 '17 at 17:07
  • Possible duplicate of [How do I read a specified line in a text file?](https://stackoverflow.com/questions/1262965/how-do-i-read-a-specified-line-in-a-text-file) –  Jun 27 '17 at 17:08
  • 1
    I've been doing this for 40 years. Each text file is a little different and without seeing sample of the file it is very difficult to give any good answers. You should have to keep returning to the top of file if the code is written properly. Depending on the size of the file all the data ca be parsed in one sweep. Huge files this may not be practical due to the amount of memory it can take. – jdweng Jun 27 '17 at 17:12
  • @cutzero, yeah, but this still needs to transmit the data from the file to memory, the LINQ just hides that complexity. Under the hood, `Skip` uses the iterator, which calls MoveNext n times, which in turn [calls ReadLine](https://referencesource.microsoft.com/#mscorlib/system/io/ReadLinesIterator.cs,49) n times... from there I'd predict that a `Seek` approach is always faster, regardless if all lines are equal length or not. – Cee McSharpface Jun 27 '17 at 17:14
  • you're completly right –  Jun 27 '17 at 17:18

1 Answers1

2

There is no feature to seek to a certain line, but you can seek to a certain offset (byte position). There is the File.ReadLines().Skip() approach, but it still reads all the lines in the current implementation of the .NET framework.

So when you stop processing a file, store the current offset. When you want to pick up from there later, Seek to the stored offset (of course this is only a valid approach if we can safely assume that parts of the file before the stored offset won't change between passes).

Read this for a possible implementation.

Cee McSharpface
  • 8,493
  • 3
  • 36
  • 77
  • I think a good simple implementation to start at a certain position is: https://stackoverflow.com/a/7596554/3873799. It starts from a certain byte position (not line). – alelom Feb 17 '23 at 15:22