2

I'm trying to sum all the value where date1 matches date2 for each row in my dataframe, which is similar to this question.

id        value       date1       date2   sum
A         150      4/8/2014    3/8/2014   nan
B         100      5/8/2014    2/8/2014   nan
B         200      7/8/2014    5/8/2014   100
A         200      4/8/2014    3/8/2014   nan
A         300      6/8/2014    4/8/2014   350

I've tried the following but keep getting 'Lengths must match to compare' error.

grp = df.groupby('id')
df['sum'] = grp.apply(lambda x: x[x['date1'] == df['date2'].values]['value'].sum())

Would appreciate any advice!

Community
  • 1
  • 1
tbk
  • 523
  • 4
  • 9
  • Problem is with: `x['date1'] == df['date2']`, Series have different lengths. – jezrael Dec 16 '15 at 08:29
  • IIUC doesn't the following give you what you want: `df1 = df.set_index('id') df1.loc[df1['date1'].isin(df1['date2']),'value'].sum(level=0)`? – EdChum Dec 16 '15 at 09:04

2 Answers2

1

You can apply function to groupby where use another apply with replace 0 to NaN:

print df

#  id  value      date1      date2  sum
#0  A    150 2014-04-08 2014-03-08  NaN
#1  B    100 2014-05-08 2014-02-08  NaN
#2  B    200 2014-07-08 2014-05-08  100
#3  A    200 2014-04-08 2014-03-08  NaN
#4  A    300 2014-06-08 2014-04-08  350


def f(x):
    x['sum1'] = x.date2.apply(lambda y:  x[x.date1 == y].value.sum()).replace(0, np.nan)
    return x

df = df.groupby('id').apply(f)

print df

#  id  value      date1      date2  sum  sum1
#0  A    150 2014-04-08 2014-03-08  NaN   NaN
#1  B    100 2014-05-08 2014-02-08  NaN   NaN
#2  B    200 2014-07-08 2014-05-08  100   100
#3  A    200 2014-04-08 2014-03-08  NaN   NaN
#4  A    300 2014-06-08 2014-04-08  350   350
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks, but actually I'm trying to achieve the `sum` column in my example. sorry if that isn't clear. – tbk Dec 16 '15 at 09:38
  • thanks, but this ignores the `id`, hence not exactly what I need. but thanks for the prompt responses! – tbk Dec 17 '15 at 06:08
  • it works! thanks jezrael, appreciate your helpfulness! – tbk Dec 17 '15 at 09:28
0

You can use this apply function:

def checkSum(record):
    date2 = record
    sum = df[df["date1"] == date2]["value"].sum()
    if sum == 0:
        return float('nan')
    else:
        return sum

df['sum2'] = df["date2"].apply(checkSum)
  • thanks, but this ignores the `id` field too. I tried `df['sum2'] = grp["date2"].apply(checkSum)` but got the _Series lengths must match to compare_ error. – tbk Dec 17 '15 at 06:14