Consider a dataframe that would look like this :
Col1 Col2 Col3
ABC 00012 Hey!
A1C 00012 Hello!
AAA 00012 Hello
ABC 00033 Hey!
A1C 00021 Hi!
AAA 00021 Hey...
B3Y 00002 Hi.
A5I 00002 Hey?
EAA 00002 Hey!
Column2
values are already packed together (not necessarily sorted by value, but at least packed together on consecutive lines). My goal is to iterate over smaller dataframes that correspond to the lines with the same values of Column2
.
One way of doing this would be:
df = pd.read_csv(myfile, sep = "\t")
keys = pd.unique(df["Col2"])
for key in keys:
subdf = df[df["Col2"] == key]
myfunction(subdf) # Do something with the sub-dataframe
However, this would be super inefficient, since a filtering on the whole dataframe would take place at each iteration.
QUESTION: How to make that code more efficient and leverage the fact that the values in Column2
are already grouped on consecutive lines?