I have a 1.4GB file and I'm trying to iterate over every line, I tried the normal approach and this happened:
counter = 0
with open("myfile.txt") as infile:
for line in infile:
counter+=1
if target in line:
print line
print counter
658785
OK, everything looks good, but then I realized that the count is way lower than what it should be, so I wrote this instead:
textfile = open("myfile.txt")
while True:
line = text_file.readline()
if not line: break
counter+=1
print counter
Same number of rows, but I know for a fact that this file has over 20 million rows, anyone knows what I'm doing wrong?
EDIT: Seems like people are skeptic whether I'm reading the right files, how am I verifying the lines, etc.
So just a simple example if I run this:
counter=0
total_lines = 0
while True:
line = text_file.readline()
total_lines+=1
if target in line:
print line.split("|")[0].strip(), counter, total_lines
counter+=1
This is my output:
HAIRY MOOSE 0 4722388
HAIRY MOOSE 1 4722389
HAIRY MOOSE 2 4722390
....
....
IN *HAIRY MOOSES CLEANING 45 12244264
IN *HAIRY MOOSES OF TU 46 12244265
IN *HAIRY MOOSES OF TULSA 47 12244266
but if I read it the other way, it finishes before a single match is found.