I'm trying to paper over missing data in a dataframe by grouping on one column and then flood-filling (bfill().ffill()) subsets of columns inside the groups.
I was previously using
def ffbf(x):
return x.ffill().bfill()
df[some_cols] = df.groupby(group_key)[some_cols].transform(ffbf)
but transform becomes unbelievably slow even on relatively small dataframes (already several seconds for only 3000x20), so I wanted to see if I could apply ffill and bfill directly to the groups since they're supposed to be cythonized now.
Am I correct in thinking that I need to invoke groupby again in between ffill and bfill because neither method preserves the groupings?
Right now I have
df[some_cols] = df[some_cols].groupby(group_key).ffill().groupby(group_key).bfill()
and I think that it's doing what I want, and it's waaaaaaayyy faster than using transform, but I'm not experienced enough with pandas to be certain, so I figured I'd ask.
[edit] It looks like this change is jumbling my data. Why?