I'm trying to refactor the following pseudo-code logic to take the function from reading and writing to files to be all in memory using pandas, but am confused by how the with
function operates as compared to a loop over a pandas
dataframe
.
This is the code I would like to refactor:
results = []
with open('data.csv', 'rt') as ins:
next(ins) # drop header
a1, b1, c1 = next(ins).strip().split(',')
for i, line in enumerate(ins, 2):
a2, b1, c1 = line.strip().split(',')
...
results.append(dummy_func(a1 b1, c1))
else:
results.append(dummy_func(a1 b1, c1))
Is this the in memory equivalent, in particular I'm not sure if with
ins
are lines in the file, do I need both itertuples
, and on a side note is itertuples
the best thing to use here, faster than iterrows
for example?
import pandas as pd
df = pd.read_csv('data.csv', sep=',')
results = []
for row in df.itertuples():
a1, b1, c1 = row.a, row.b, row.c
for row2 in df.loc[2:].itertuples():
a1, b1, c1 = row2.a, row2.b, row2.c
...
result.append(dummy_func(a1, b1, c1))
else:
result.append(dummy_func(a1, b1, c1))