2

I have a DataFrame like this:

import pandas as pd

df=pd.DataFrame()
df['exchange'] = [1, 1, 1, 2, 3]
df['type'] = ['deposit', 'deposit', 'trade', 'deposit', 'deposit']
df['value'] = [10, 10, '30', '40', '100']

which looks like:

    exchange     type       value
 0  1            deposit    10
 1  1            deposit    10
 2  1            trade      30
 3  2            deposit    40
 4  3            deposit    100

I want to add the elements in the "value" column where "type"='deposit' based on the "exchange" and forward-fill to get something like this:

    exchange     type       value   balance
 0  1            deposit    10      10
 1  1            deposit    10      20
 2  1            trade      30      20
 3  2            deposit    40      40
 4  3            deposit    100     100

where "balance" is the sum of deposits filtered by "exchange".

Is there a way to do this pythonically without for loops/if statements?

MathMan 99
  • 665
  • 1
  • 7
  • 19

3 Answers3

3

You can first group by "exchange", then apply np.cumsum and finally assign the result where type is "deposit".

import pandas as pd
import numpy as np

df.loc[df["type"]=="deposit", "balance"] = df.loc[df["type"]=="deposit"].groupby("exchange", sort=False)["value"].apply(np.cumsum)

Finally you can fill missing value with the forward-fill as you have mentioned.

df = df.fillna(method='ffill')
3

You could use groupby + cumsum to fill in "balance" column; this fills in for "trade"s as well, so you mask the result depending on with type is "deposit" or not and forward fill using ffill if it's not "deposit":

df['value'] = df['value'].astype(int)
df['balance'] = df['value'].mask(df['type']!='deposit').groupby(df['exchange']).cumsum().ffill()

Output:

   exchange     type  value  balance
0         1  deposit     10     10.0
1         1  deposit     10     20.0
2         1    trade     30     20.0
3         2  deposit     40     40.0
4         3  deposit    100    100.0
  • 1
    @ALollz I see; so we need to `mask` it before `groupby` right? Thanks for pointing it out. –  Mar 10 '22 at 16:07
1

You use where to NaN non deposit rows and then use an expanding sum, within each exhange group as it considers NaN 0 when summing so it winds up forward filling just as you want.

df['balance'] = (df['value'].where(df['type'].eq('deposit'))
                   .groupby(df['exchange'])
                   .expanding().sum()
                   .reset_index(0, drop=True))

   exchange     type  value  balance
0         1  deposit     10     10.0
1         1  deposit     10     20.0
2         1    trade     30     20.0
3         2  deposit     40     40.0
4         3  deposit    100    100.0
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • I think both your and @enke solutions work only if "exchange" is sorted. In my solution I added `sort=False` in the `groupby`, – Salvatore Daniele Bianco Mar 10 '22 at 16:10
  • @SalvatoreDanieleBianco No it will work regardless. `groupby` maintains the order of rows within the group, the `sort` only determines whether the keys you group by are sorted in the end, it has nothing to do with individual rows. : https://stackoverflow.com/questions/47185894/iteration-order-with-pandas-groupby-on-a-pre-sorted-dataframe/52557580#52557580 – ALollz Mar 10 '22 at 16:15
  • I think this is the most syntactically correct answer –  Mar 10 '22 at 16:18
  • @ALollz yes sorry I was wrong. I have tried with a not-sorted "exchange" and it works fine. – Salvatore Daniele Bianco Mar 10 '22 at 16:23