4

Possible Duplicate:
How to count lines fast?

I have some files that contains data line by line.

I want to get the line count in a file to show progress state to user. (I process these files in background reading line by line)

I can do this by reading the file completely, but these files are so big that my application unnecessarily consumes RAM space.

So I want to get line count in a file without reading the whole file.

How can I do this?

Community
  • 1
  • 1
Uğur Aldanmaz
  • 1,018
  • 1
  • 11
  • 16

5 Answers5

8
  1. Read the size (in bytes) of the file -- the o/s will tell you this.
  2. Read the first 1000 lines (and process them).
  3. Calculate the average line size.
  4. Divide this average size into the file size.
  5. Now you have an estimate of the number of lines in the file, accurate enough for a progress bar display sort of thing.
  6. If this is not accurate enough, recompute every now and then as you read the file.
High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • Thansk for answer. This is optimal solution for my problem – Uğur Aldanmaz Dec 17 '12 at 13:17
  • For a progress bar I found this to be accurate enough: long fileSize = new FileInfo(dataPath).Length; bytesProcessed += line.Length; progress = fileSize/bytesProcessed (assuming the line is a string) – chrispepper1989 Feb 27 '15 at 10:01
7

Obviously you cannot. The only way to get the lines count is to count the new line character in the file, and you need to read the file for it.

I can do this with read file completely. But these files are so big. Therefore, my application unnecessarily consume RAM.

You can read file partially (so that every part is small enough to fit memory) and accumulate the lines count from every small part.

zerkms
  • 249,484
  • 69
  • 436
  • 539
5
var lineCount = File.ReadLines(@"C:\file.txt").Count();
VladL
  • 12,769
  • 10
  • 63
  • 83
  • Method's name is `ReadAllLines` – Ivan Golović Dec 17 '12 at 13:14
  • Based on @zerkms answer `Obviously you cannot`, I think this is a good answer or rather quick count. – Kaf Dec 17 '12 at 13:15
  • @Kaf: could you elaborate? – zerkms Dec 17 '12 at 13:16
  • 1
    @zerkms, you are saying it cannot be done. (+1 for that answer as well). This may not be answering the question (`without loading to memory`) but still a neat way of counting lines. – Kaf Dec 17 '12 at 13:18
  • 1
    @zerkms yes I understand what you mean. I agree with you. What I meant was this a `quick` way of counting lines, of course it will load the file. – Kaf Dec 17 '12 at 13:21
  • 3
    @zerkms Although this answer **reads** the whole file, It doesn't load all the content to memory. `ReadLines` returns `IEnumerable` – L.B Dec 17 '12 at 13:22
  • 1
    @zerkms Exactly as you said. It doesn't store anything. – L.B Dec 17 '12 at 13:26
  • @L.B: just checked it, yes. :-S Then I was completely wrong in all my comments here – zerkms Dec 17 '12 at 13:27
  • ReadLines uses as much as I know yield return, something like lazy loading – VladL Dec 17 '12 at 13:27
  • 1
    @L.B: Thanks for coming up with facts. I was also wrong with agreeing with `@zerkms`. – Kaf Dec 17 '12 at 13:32
0

It's impossible to count lines of text file(exactly, but you can make a guess based on first n lines) without reading it into memory (you do not need to read a whole file at once, you can read line by line, e.g with ReadLine, that won't consume much RAM). Also have a look at similar question

long count = 0;
using (StreamReader r = new StreamReader(f))
{
    string line;
    while ((line = r.ReadLine()) != null)
    {
        count++;
    }
}

return count;
Community
  • 1
  • 1
illegal-immigrant
  • 8,089
  • 9
  • 51
  • 84
0

Another possibility, but this only applies if you are also responsible for development of the application that produces the data files, is to have it create two files. One that contains the data, and one that only contains the line count of the data file. Then when you are ready to process the data file, your processing application can read the line count from the line count file, then start processing the data file.

If you have no access to the data generation application, just disregard this answer as it will not be applicable to your problem.

Kevin
  • 704
  • 3
  • 4