I need to process a very large CSV file.
During the process the first line needs some special attention.
So, the obvious code would be to check for the value in the csv-line. But that means a string-compare for every line (around 200.000)
Another option would be to set a boolean and let the boolean compare come first in an 'or' expression.
Both options are below:
import csv
def do_extra_processing():
pass
def do_normal_processing():
pass
if __name__ == "__main__":
with open('file.csv', newline='') as csvfile:
lines = csv.reader(csvfile, delimiter=';')
line_checked: bool = False
for line in lines:
# Check the first line: Option 1
if line[1] == "SomeValue":
# Every line of the 200000 lines does the string-compare
do_extra_processing()
do_normal_processing()
# Check the first line: Option 2
if (line_checked) or (line[1] == "SomeValue"):
# Every line of the 200000 lines does the boolean-compare first and does not evaluate the string compare
do_extra_processing()
line_checked = True
do_normal_processing()
I've checked that in an 'or' expression, the second part is not evaluated when the first part is True.
The boolean is initialized just above the for-loop and set in the if-statement when the extra_processing is done.
The question is: Is the second option with the bool-compare significantly faster?
(No need to convert to , so different question than 37615264 )