Obtain progress in reading large text file

Question

I have a very large text file (several GB in size) which I need to read into Python and then process line by line.

One approach would be to simply call data=f.readlines() and then process the content. With that approach I know the total number of lines and can easily measure the progress of my processing. This however is probably not the ideal approach given the file size.

The alternative (and I think better) option would be to say:

    for line in f:
      do something

Just now I am not sure how to measure my progress anymore. Is there a good option that does not add a huge overhead? (One reson why I may want to know the progress is for one to have a rough indicator of the remaining time, as all lines in my file have similar sizes, and to ascertain whether my script is still doing something or has gotten stuck somewhere.)

There is no better way than you suggested. See here: http://stackoverflow.com/questions/845058/how-to-get-line-count-cheaply-in-python — Serbitar, Jan 27 '16 at 12:28

score 0 · Answer 1 · edited May 24 '17 at 10:41

0

if using linux os there is a way out it seems.

a = os.popen("wc -l some.txt")
f = a.read()

On reading you get the number of lines as well as name of file

edited May 24 '17 at 10:41

Rahul K P

15,740
4
35
52

answered Jan 27 '16 at 12:12

Sreeja Nampoothiri

21
2

getsize returns bytes not line numbers – Serbitar Jan 27 '16 at 12:27
yes Serbitar as u said it was returning only bytes. Mistake!!! .. If the os used is linux then .. the above edited answer might work. – Sreeja Nampoothiri Jan 27 '16 at 14:08

Obtain progress in reading large text file

1 Answers1