I am trying to read a text file with 3 million lines using following code :
f = open("somefile.txt", "r")
i = 0
st = time.time()
mydata = []
for line in f:
mydata.append(do_something(line))
i += 1
if i%10000 == 0:
print "%d done in %d time..." % (time.time() - st)
st = time.time()
Following is the output printed on console:
10000 done in 6 time...
20000 done in 9 time...
30000 done in 11 time...
40000 done in 14 time...
50000 done in 15 time...
60000 done in 17 time...
70000 done in 19 time...
80000 done in 21 time...
90000 done in 23 time...
100000 done in 24 time...
110000 done in 26 time...
120000 done in 28 time...
130000 done in 30 time...
140000 done in 32 time...
150000 done in 33 time...
160000 done in 36 time...
170000 done in 39 time...
180000 done in 41 time...
190000 done in 45 time...
200000 done in 48 time...
210000 done in 48 time...
220000 done in 53 time...
230000 done in 56 time...
......and so on.....
I am not sure why the time taken for reading same number of lines(10000) is increasing over iterations. Is there a way to avoid this or read big files in better way ?