The code must search in every line, there is no other way for a computer program to know if a string contains a substring.
So we have to find a way to iterate each line faster.
Here is how we could make it more efficient:
The call to index slows down your code. So, the first optimization would be to use enumerate
:
file = "file.csv"
to_find = "satoshi"
with open(file) as f:
lines = f.readlines()
row_number = []
for index,line in enumerate(lines):
if to_find in line:
row_number.append(index)
first_line_index = row_number[0]
last_line_index = row_number[-1]
first_line = lines[first_line_index]
last_line = lines[last_line_index]
print(first_line)
print(last_line)
An other optimization is to use list comprehension, it wil be way faster:
file = "file.csv"
to_find = "satoshi"
with open(file) as f:
lines = f.readlines()
lines_indexes = [index for index,line in enumerate(lines) if to_find in line]
first_line_index = lines_indexes[0]
last_line_index = lines_indexes[-1]
first_line = lines[first_line_index]
last_line = lines[last_line_index]
print(first_line)
print(last_line)
Following comments, this is an edit:
I wrote:
Note that the answer of @Janez Kuhar does iterate all lines because the else if
statement has, as written, no effect on lines iteration. And in Python there is no else if
keyword, just elif
. That's a design error too because the elif
statement has no relation with the if
statement here. Third note: with this code, you will not get the first and the last line, but only the first line. Lastly, there's a problem by using index
because if one of yourfile contains identical lines, the call to index
will, unfortunately, always return the same line.
Regarding the problem of not getting the real last line containing the substring:
That's the case if the files are structured like this:
abcd
abcd
abcd
abcd
abcd satoshi abcd 1
abcd satoshi abcd 2
abcd satoshi abcd 3
abcd
abcd satoshi abcd 4
abcd
abcd
abcd
Here, you will get the third line as the last line containing the substring, but it should be the fourth.
But, if your files have this structure:
abcd
abcd
abcd
abcd
abcd satoshi abcd 1
abcd satoshi abcd 2
abcd satoshi abcd 3
abcd satoshi abcd 4
abcd
abcd
abcd
So that each time, all line containing the substring are always put the one after the other, @Janez Kuhar code will effectively provide the real last line. And of course in this case there is no need to iterate all the lines.
It was unclear to me that the lines will always be one after the other as @Janez Kuhar pointed out. I thought it could have some other lines (not containing the substring) in between, even if they appear in a specific part of the file.
And by the way, I'm glad we had this constructive and instructive debate !