i am having a hard time using a group by + where to apply a sum to a broader range.
given this code:
from io import StringIO
import numpy as np
f = pd.read_csv(StringIO("""
fund_id,l_s,val
fund1,L,10
fund1,L,20
fund1,S,30
fund2,L,15
fund2,L,25
fund2,L,35
"""))
# fund total - works as expected
f['fund_total'] = f.groupby('fund_id')['val'].transform(np.sum)
# fund L total - applied only to L rows.
f['fund_total_l'] = f[f['l_s'] == "L"].groupby('fund_id')['val'].transform(np.sum)
f
numbers are correct, but i would like fund_total_l
column to show 30
for all rows of fund1 (not just L
). I want a fund level summary, but sum filtered by the l_s
column
i know i can do this with multiple steps, but this needs to be a single operation. i can use a separate generic function if that helps.
playground: https://repl.it/repls/UnusualImpeccableDaemons