In order to perform tests, I copied in a file AAA.txt the following text of 6,31 MB and around 128.000 lines:
http://norvig.com/big.txt
Then with the help of random module, I changed it to a file BBB.txt by randomly inserting '1234567'
at the starts of 1000 lines of it.
I tested several solutions on this modified text.
I can't discriminate which one of the following ones is the fastest, but I think they're all faster than other solutions that I read in this page and other solutions of mine.
They are based on the fact that the "in"-test 'string' in 'anotherstring'
is tremendously fast.
def in_and_startswith(x):
return '1234567' in x and x.startswith('1234567')
with open('BBB.txt') as f:
for line in filter(in_and_startswith, f):
x=0
.
def in_and_find(x):
return '1234567' in x and x.find('1234567')==0
with open('BBB.txt') as f:
for line in filter(in_and_find, f):
x=0
.
def just_in(x):
return '1234567' in x
with open('BBB.txt') as f:
for line in filter(just_in, f):
if line.startswith('1234567'):
x=0
with open('BBB.txt') as f:
for line in filter(just_in, f):
if line.find('1234567')==0:
x=0
Note that I tested with just the instruction x=0
that has no particular sense, to avoid instruction print(line)
because print()
is an instruction that takes a long time to execute.
So repeating several print()
instructions is much longer than printing just one string obtained as joining all the strings that must be printed.
Test the execution times of
for x in ['hkjh','kjhoi','3135487j','kjhskdkfh','54545779']:
print(x)
and
print('\n'.join(x for x i['hkjh','kjhoi','313587j','kjhskdkfh','54545779']))
you'll see the difference