I have the feeling that my question is related to Why does takewhile() skip the first line?
I haven't found satisfactory answers in there though.
My examples below use the following modules
import csv
from itertools import takewhile
Here is my problem. I have a csv file which I want to parse using itertools.
For instance, i want to separate the header from the content. This is spotted by the presence of a keyword in the first column.
Here is file.csv
example
a, content
b, content
KEYWORD, something else
c, let's continue
The two first lines compose the header of the file.
The KEYWORD
line separates it from the content: the last line.
Even, if it is not properly part of the content, I want to parse the separation row.
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
header = takewhile(lambda x: x[0] != 'KEYWORD', reader)
for row in header:
print(row)
print('End of header')
for row in reader:
print(row)
I was not expecting this, but the KEYWORD
line is skipped.
As you will see in the following output:
['a', ' content']
['b', ' content']
End of header
['c', " let's continue"]
I have tried simulating the csv reader to see if it was coming from there. But apparently not. The following code produces the same behavior.
l = [['a', 'content'],
['b','content'],
['KEYWORD', 'something else'],
['c', "let's continue"]]
i = iter(l)
header = takewhile(lambda x: x[0] != 'KEYWORD', i)
for row in header:
print(row)
print('End of header')
for row in i:
print(row)
How can I do to use the feature of takewhile, while preventing the following for the skip the unparsed line ?
As I have understood, the first for
calls for next
on the iterator, to test its content.
The second calls for next
once again, to gather the value.
And the separation row is hence skipped.