4

Possible Duplicate:
How to get line count cheaply in Python?

Good day. i have some code below, which implements per line file reading and counter iteration.

def __set_quantity_filled_lines_in_file(self):
    count = 0
    with open(self.filename, 'r') as f:
        for line in f:
             count += 1
    return count

My question is, are there methods to determine how many lines of text data in current file without per line iteration?

Thanks!

Community
  • 1
  • 1
Dmitry Zagorulkin
  • 8,370
  • 4
  • 37
  • 60

5 Answers5

5

In general it's not possible to do better than reading every character in the file and counting newline characters.

It may be possible if you know details about the internal structure of the file. For example, if the file is 1024kB long, and every line is 1kB in length, then you can deduce there are 1024 lines in the file.

Li-aung Yip
  • 12,320
  • 5
  • 34
  • 49
3

I'm not sure if Python has that function or not, highly doubt it, but it would essentially require reading the whole file. A newline is signified by the \n character (actually system dependent) so there is no way to know how many of those exist in a file without going through the whole file.

user845279
  • 2,794
  • 1
  • 20
  • 38
1

No, such information can only be retrieved by iterating over the whole file's content (or reading the whole file into memory. But unless you know for sure that the files will always be small better don't even think about doing this).

Even if you do not loop over the file contents, the functions you call do. For example, len(f.readlines()) will read the whole file into a list just to count the number of elements. That's horribly inefficient since you don't need to store the file contents at all.

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • I think other posts here have proved this statement untrue. Iteration is not the only way. – Jay M May 12 '12 at 08:49
  • 2
    @JasonMorgan - are you saying you know how to count the occurrences of `\r\n` in a file in less than O(n) time? If so, please provide details. – Li-aung Yip May 12 '12 at 08:50
  • 1
    @JasonMorgan What else does e.g. your `Counter()` do other than iterate over the file's content? And what other does your `f.read()` do than reading the whole file content, needing an unnecessary amount of memory? – glglgl May 12 '12 at 08:53
  • 2
    @JasonMorgan: I was not talking about the code but about what actually happens. `len(r.readlines())` does it without iterating manually but the whole file is read into a list and then thrown away after determining its length. So it's a waste of memory (although that only applies for a rather short time) – ThiefMaster May 12 '12 at 08:54
  • Thank you Jason. I think I will write(in another process) information in the several bytes in file. When a need to understood about who many lines text in a file, I will read this bytes. – Dmitry Zagorulkin May 12 '12 at 16:54
  • Sorry, Perhaps I was not clear. What I meant is iteration in *Python* is not the only way. Lower level languages are much faster at finding patterns in memory. E.g. In my answer the collections module is used, this is compiled code, not Python. As others have said, you could write your own optimised module and wrap it. – Jay M Apr 18 '15 at 10:09
1

You could use the readlines() file method and this is probably the easiest.

If you want to be different, you could use the read() member function to get the entire file and count CR, LF,CRLR LFCR character combinations using collections.Counter class.
However, you will have to deal with the various ways of terminating lines.
Something like:

import collections
f=open("myfile","rb")
d=f.read()
f.close()
c=collections.Counter(d)
lines1=c['\r\n']
lines2=c['\n\r']
lines3=c['\r']-lines1-lines2
lines4=c['\n']-lines1-lines2
nlines=lines3+lines4
Jay M
  • 3,736
  • 1
  • 24
  • 33
  • i'm not interesting in the easiest way, i'm looking for a scalable way and the fastest way to perform this action. – Dmitry Zagorulkin May 12 '12 at 08:54
  • Assuming your files are always less than 2G, the fastest and most scalable way is going to be do it in C. Create a Python extension in C which just counts lines from a buffer in memory. – Jay M May 12 '12 at 14:09
  • '\n\r' will be treated as two lines on most platforms, no? – anatoly techtonik Apr 17 '15 at 09:28
  • @JasonMorgan, nah - this approach doesn't work - http://stackoverflow.com/questions/29695861/get-newline-stats-for-a-text-file-in-python – anatoly techtonik Apr 17 '15 at 10:02
  • @techtonik I already stated that, if required, you would have to handle multiple platforms in my answer. Thanks for the link to the other question related to this. – Jay M Apr 18 '15 at 10:07
0

This gives the answer, but reads the whole file and stores the lines in a list

    len(f.readlines())
Schuh
  • 1,045
  • 5
  • 9