1

I'm trying to write a function that will backfill columns in a dataframe adhearing to a condition. The upfill should only be done within groups. I am however having a hard time getting the group object to ungroup. I have tried reset_index as in the example bellow but that gets an AttributeError.

Accessing the original df through result.obj doesn't lead to the updated value because there is no inplace for the groupby bfill.

def upfill(df:DataFrameGroupBy)->DataFrameGroupBy:
    for column in df.obj.columns:
        if column.startswith("x"):
            df[column].bfill(axis="rows", inplace=True)
    return df 

Assigning the dataframe column in the function doesn't work because groupbyobject doesn't support item assingment.

def upfill(df:DataFrameGroupBy)->DataFrameGroupBy:
    for column in df.obj.columns:
        if column.startswith("x"):
            df[column] = df[column].bfill()
    return df 

The test I'm trying to get to pass:


def test_upfill():
    df = DataFrame({
        "id":[1,2,3,4,5],
        "group":[1,2,2,3,3],
        "x_value": [4,4,None,None,5],
    })
    grouped_df = df.groupby("group")
    result = upfill(grouped_df)
    result.reset_index()
    assert result["x_value"].equals(Series([4,4,None,5,5]))


jhylands
  • 984
  • 8
  • 16
  • What you want to achieve is unclear. Can you add a simple input/output example? – mozway Nov 25 '22 at 13:08
  • So to add some explanation to the in/out of the test. The expected values of the "x_value" column should be upfilled inside of the groups. The None value remains in the expected series because in group 2 we have the values 4,None. The None doesn't get filled because it is after the 4. In group 3 we have None, 5. The None should get back filled since it is before the 5. – jhylands Nov 25 '22 at 13:18

2 Answers2

1

You should use 'transform' method on the grouped DataFrame, like this:

import pandas as pd

def test_upfill():
    df = pd.DataFrame({
        "id":[1,2,3,4,5],
        "group":[1,2,2,3,3],
        "x_value": [4,4,None,None,5],
    })
    result = df.groupby("group").transform(lambda x: x.bfill())
    assert result["x_value"].equals(pd.Series([4,4,None,5,5]))

test_upfill()

Here you can find find more information about the transform method on Groupby objects

1

Based on the accepted answer this is the full solution I got to although I have read elsewhere there are issues using the obj attribute.

def upfill(df:DataFrameGroupBy)->DataFrameGroupBy:
    columns = [column for column in df.obj.columns if column.startswith("x")]
    df.obj[columns] = df[columns].transform(lambda x:x.bfill())
    return df 
def test_upfill():
    df = DataFrame({
        "id":[1,2,3,4,5],
        "group":[1,2,2,3,3],
        "x_value": [4,4,None,None,5],
    })
    grouped_df = df.groupby("group")
    result = upfill(grouped_df)
    assert df["x_value"].equals(Series([4,4,None,5,5]))

jhylands
  • 984
  • 8
  • 16