0

I have a dataframe consisting of polynomial 1d objects in one column and I want to group the dataframe, then sum all the polynomial coeffs within the group and divide by the number of rows in the group.

However, i'm facing difficulty working with poly1d objects.

def agg_coeffs(df):

    g_all = pd.DataFrame()

    for key, g in df.groupby(['A', 'B']):
        agg_coeffs = pd.DataFrame({"agg coeffs":list(sum(g['coeffs']) / len(g['coeffs']))})
        g_all = pd.concat([g_all, agg_coeffs])

    return g_all

The function above outputs the modified coefficients in separate rows, but i want them to all be in the same row and remain as poly1d objects (not as an array or list).

Incorrect output:

    agg coeffs
0   1.91
1   88.76
2   2.5

Example dataframe:

                 coeffs                             A           B       
        10227   [0.0767614738203, 91.6253393665]    2016        p1  
        10311   [4.47454751131, 44.9313348416]      2016        p2  
        10367   [2.38170652877, 133.884680026]      2016        p3  
        10309   [0.736288998358, 84.6403688266]     2016        p4

Note: As with poly1d objects, a cell in a "coeffs" column looks like this: 0.0767614738203 x2 + 91.6253393665 + 10 (intercept is not displayed in the dataframe, but is present when the cell is subsetted from the df.

Expected output:

                 coeffs         A           B       
        0       [1.91, 88.76]   2016        p1  

Changing the function to this (removing the list wrapper):

def agg_coeffs(df):

    g_all = pd.DataFrame()

    for key, g in df.groupby(['A', 'B']):

        g.loc[:,'agg coeffs'] = sum(g['coeffs']) / len(g['coeffs'])
        g_all = pd.concat([g_all, g])

    return g_all

Results in this error on the line g.loc[:,'agg coeffs'] = sum(g['coeffs']) / len(g['coeffs']):

ValueError: Must have equal len keys and value when setting with an iterable

doyz
  • 887
  • 2
  • 18
  • 43

1 Answers1

1

Split coeffs into regular numeric columns:

df['c1'] = df['c2'] = df['c3'] = np.nan
df[['c1', 'c2', 'c3']] = [x.c for x in df.coeff]

Then groupby and agg:

grouped = df.groupby('A', as_index=False)
df2 = grouped.agg({'B':'first', 'c1':'mean', 'c2':'mean', 'c3':'mean'})

Gives you:

      A   B        c1         c2   c3
0  2016  p1  1.917326  88.770431  2.5

Then combine back to poly1d objects:

df2['coeff'] = df2[['c1','c2','c3']].apply(np.poly1d, axis=1)

Gives you:

      A   B        c1         c2   c3                           coeff
0  2016  p1  1.917326  88.770431  1.0  [1.91732612805, 88.7704307652]

Be careful though: poly1d holds a reference, not a copy of its arguments, so if you drop the c1/c2/c3 columns it'll corrupt coeff. You can solve this by copying the poly1d arguments during apply if needed.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436