-1

Are there optimized Python packages to determine how many lines there are in a big ASCII file without loading the entire file into memory? This is different than the topic of How to get line count cheaply in Python? where the question is concerned with a built-in Python solution.

Community
  • 1
  • 1
ebressert
  • 2,319
  • 4
  • 21
  • 27
  • 1
    Yes and no. You can do it without having the entire file in memory at one time, by reading in smaller chunks, but every byte of the file still needs to be loaded into memory at some point. – twalberg Jun 07 '13 at 17:31
  • "This is different than..." how is it different? The questions read exactly the same to me. – ean5533 Sep 12 '13 at 15:30
  • I'm interested in finding a Python package that is *faster* than what can be done by the default tools that come with Python. In the question "How to get line count cheaply in Python" the solutions are focused solely built-in solutions. – ebressert Sep 12 '13 at 20:26

2 Answers2

4

You can iterate through it, line-by-line:

with open('filename.txt', 'r') as handle:
    num_lines = sum(1 for line in handle)

It might be faster to read it in larger chunks and just count the newlines:

with open('filename.txt', 'r') as handle:
    num_lines = 0

    for chunk in iter(lambda: handle.read(1024*1024), None):
        num_lines += chunk.count('\n')
Blender
  • 289,723
  • 53
  • 439
  • 496
  • Internally, large chunks of the file are already buffered in memory. The iterator isn't reading a single line from disk at a time. It's already going through the file byte-by-byte, returning a line when it encounters a new-line character. – chepner Jun 07 '13 at 18:25
0

Another option involves using fileinput's lineno method

import fileinput
x = fileinput.input('test.csv')
for line in x:
    pass 
print x.lineno()
3
x.close()
iruvar
  • 22,736
  • 7
  • 53
  • 82