I have an input file like this:
a
1,100
2,200
3,300
b
1,100,200
2,200,300
3,300,400
c
...
I want to read the file into multiple data frames, with code like this (to simplify the problem, we assume the number of rows for each table is fixed):
import pandas as pd
with open("file.csv", "r") as f:
while True:
table_name = f.readline()
if table_name:
table_df = pd.read_csv(f, nrows=3)
# Do other stuff
else:
break
My initial expectation was that pd.read_csv(f, nrows=3)
consumes only limited number of rows from the input stream and the next f.readline()
call would read on. However, it turns out that after the first read_csv
call, the stream position of f
is set to the end of the file and I can no longer read from the same stream f
. My pandas version is 0.25.0. Is this a bug or an expected behaviour? Is there any way to reuse the same input stream to read multiple data frames?