0

Consider a dataframe that would look like this :

Col1   Col2     Col3
ABC    00012    Hey!
A1C    00012    Hello!
AAA    00012    Hello
ABC    00033    Hey!
A1C    00021    Hi!
AAA    00021    Hey...
B3Y    00002    Hi.
A5I    00002    Hey?
EAA    00002    Hey!

Column2 values are already packed together (not necessarily sorted by value, but at least packed together on consecutive lines). My goal is to iterate over smaller dataframes that correspond to the lines with the same values of Column2.


One way of doing this would be:

df = pd.read_csv(myfile, sep = "\t")
keys = pd.unique(df["Col2"])
for key in keys:
    subdf = df[df["Col2"] == key]
    myfunction(subdf) # Do something with the sub-dataframe

However, this would be super inefficient, since a filtering on the whole dataframe would take place at each iteration.


QUESTION: How to make that code more efficient and leverage the fact that the values in Column2 are already grouped on consecutive lines?

Vincent
  • 57,703
  • 61
  • 205
  • 388

0 Answers0