0

The code below is to be used to iterate over some large .txts.

10 minutes to iterate over a 80MB file: is that what I should expect? Is there something fundamentally wrong with my approach?

print 'File size = ' + str(os.path.getsize(FullPath))
print time.gmtime()
with open(FullPath) as FileObj:
    for lines in FileObj:
        i +=1
print i
print time.gmtime()

OUTPUT:

File size = 80536606
time.struct_time(tm_year=2015, tm_mon=4, tm_mday=27, tm_hour=15, tm_min=16, tm_sec=6, tm_wday=0, tm_yday=117, tm_isdst=0)
140614
time.struct_time(tm_year=2015, tm_mon=4, tm_mday=27, tm_hour=15, tm_min=26, tm_sec=21, tm_wday=0, tm_yday=117, tm_isdst=0)

Based my code on these links:

How to read large file, line by line in python

Looping through big files takes hours in Python

Community
  • 1
  • 1
BuckTurgidson
  • 289
  • 3
  • 8
  • 17
  • is the program hogging your cpu? – Phillip Martin Apr 27 '15 at 15:42
  • @PhillipMartin. If I understand your question: Task manager shows CPU usage at 5%/6% and memory 3.something GB. – BuckTurgidson Apr 27 '15 at 15:44
  • 1
    I ran your code on a csv that is 140MB and the time, including getting size of the file, to print the number of lines was just under 4 seconds. So the approach seems right. Move your big file to the folder you are running the code from, if you can, and see if that changes anything. – Scott Apr 27 '15 at 15:50
  • @Scott. Indeed I'm loading this file to memory from a server that I'm not even sure where's located. Now that I say that it's kind of obvious... – BuckTurgidson Apr 27 '15 at 15:54
  • 3
    @BuckTurgidson I figured you were reading it from somewhere other than your machine. That's probably the hang up. – Scott Apr 27 '15 at 15:56
  • Do you just want to get the number of lines in the file, or is that just some placeholder code? – ASCIIThenANSI Apr 27 '15 at 16:01
  • @ASCIIThenANSI. Placeholder. – BuckTurgidson Apr 27 '15 at 16:02

0 Answers0