1

I have a very large text file (several GB in size) which I need to read into Python and then process line by line.

One approach would be to simply call data=f.readlines() and then process the content. With that approach I know the total number of lines and can easily measure the progress of my processing. This however is probably not the ideal approach given the file size.

The alternative (and I think better) option would be to say:

    for line in f:
      do something

Just now I am not sure how to measure my progress anymore. Is there a good option that does not add a huge overhead? (One reson why I may want to know the progress is for one to have a rough indicator of the remaining time, as all lines in my file have similar sizes, and to ascertain whether my script is still doing something or has gotten stuck somewhere.)

P-M
  • 1,279
  • 2
  • 21
  • 35
  • There is no better way than you suggested. See here: http://stackoverflow.com/questions/845058/how-to-get-line-count-cheaply-in-python – Serbitar Jan 27 '16 at 12:28

1 Answers1

0

if using linux os there is a way out it seems.

a = os.popen("wc -l some.txt")
f = a.read()

On reading you get the number of lines as well as name of file

Rahul K P
  • 15,740
  • 4
  • 35
  • 52