2

How to create cumulative sum (new_supply)in dataframe python from demand column from table

item   Date       supply  demand
A     2018-01-01  -       10
A     2018-01-02  -       15
A     2018-01-03  100     30
A     2018-01-04  -       10
A     2018-01-05  -       40
A     2018-01-06  50      50
A     2018-01-07  -       10
B     2018-01-01  -       20
B     2018-01-02  -       30
B     2018-01-03  20      60
B     2018-01-04  -       20
B     2018-01-05  100     10
B     2018-01-06  -       20
B     2018-01-07  -       30

New Desired table from the above table

item   Date       supply  demand  new_supply
A     2018-01-01  -       10      -
A     2018-01-02  -       15      -
A     2018-01-03  100     30      55
A     2018-01-04  -       10      -
A     2018-01-05  -       40      -
A     2018-01-06  50      50      100
A     2018-01-07  -       10      -
B     2018-01-01  -       20      -
B     2018-01-02  -       30      -
B     2018-01-03  20      60      110
B     2018-01-04  -       20      -
B     2018-01-05  100     10      140
B     2018-01-06  -       20      -
B     2018-01-07  -       30      -
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53

2 Answers2

1

I believe you need DataFrameGroupBy.cumsum with numpy.where with isnumeric for set new values only for numeric ones by column supply:

m = df['supply'].str.isnumeric()
g = m.iloc[::-1].cumsum()

df['new'] = np.where(m, df.groupby(['item',g])['demand'].cumsum(), '-')
print (df)
   item        Date supply  demand  new
0     A  2018-01-01      -      10    -
1     A  2018-01-02      -      15    -
2     A  2018-01-03    100      30   55
3     A  2018-01-04      -      10    -
4     A  2018-01-05      -      40    -
5     A  2018-01-06     50      50  100
6     A  2018-01-07      -      10    -
7     B  2018-01-01      -      20    -
8     B  2018-01-02      -      30    -
9     B  2018-01-03     20      60  110
10    B  2018-01-04      -      20    -
11    B  2018-01-05    100      10   30
12    B  2018-01-06      -      20    -
13    B  2018-01-07      -      30    -

Or:

m = df['supply'].str.isnumeric()

df['new'] = np.where(m, df.groupby('item')['demand'].cumsum(), '-')
print (df)
   item        Date supply  demand  new
0     A  2018-01-01      -      10    -
1     A  2018-01-02      -      15    -
2     A  2018-01-03    100      30   55
3     A  2018-01-04      -      10    -
4     A  2018-01-05      -      40    -
5     A  2018-01-06     50      50  155
6     A  2018-01-07      -      10    -
7     B  2018-01-01      -      20    -
8     B  2018-01-02      -      30    -
9     B  2018-01-03     20      60  110
10    B  2018-01-04      -      20    -
11    B  2018-01-05    100      10  140
12    B  2018-01-06      -      20    -
13    B  2018-01-07      -      30    -

If need all values:

df['new'] = df.groupby('item')['demand'].cumsum()
print (df)
   item        Date supply  demand  new
0     A  2018-01-01      -      10   10
1     A  2018-01-02      -      15   25
2     A  2018-01-03    100      30   55
3     A  2018-01-04      -      10   65
4     A  2018-01-05      -      40  105
5     A  2018-01-06     50      50  155
6     A  2018-01-07      -      10  165
7     B  2018-01-01      -      20   20
8     B  2018-01-02      -      30   50
9     B  2018-01-03     20      60  110
10    B  2018-01-04      -      20  130
11    B  2018-01-05    100      10  140
12    B  2018-01-06      -      20  160
13    B  2018-01-07      -      30  190
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • As it appears, I think the OP wants the cumsum to be resetted after a new supply is issued, right? I was about to write an answer, but I searched first and found that [you have already answered this](https://stackoverflow.com/a/32891081/2099607) :)) The only difference is that a groupby is needed here. – today Nov 04 '18 at 10:24
  • @today - yes, it should working nice at least 0 value in `supply`... But still not 100% sure what OP need :( – jezrael Nov 04 '18 at 10:28
  • Thanks brother, can answer please : https://stackoverflow.com/questions/53140438/how-to-create-cummulative-sum-in-dataframe-python-part2 – Doni Suhartono Nov 04 '18 at 11:44
0

Simply as:

df['new_supply'] = df.demand.cumsum()

Precisely above should work, when we are looking for cumulative sum of a column.. Just simulating your DataFrame:

>>> df
   item        Date supply  demand
0     A  2018-01-01      -      10
1     A  2018-01-02      -      15
2     A  2018-01-03    100      30
3     A  2018-01-04      -      10
4     A  2018-01-05      -      40
5     A  2018-01-06     50      50
6     A  2018-01-07      -      10
7     B  2018-01-01      -      20
8     B  2018-01-02      -      30
9     B  2018-01-03     20      60
10    B  2018-01-04      -      20
11    B  2018-01-05    100      10
12    B  2018-01-06      -      23
13    B  2018-01-07      -      30

So, When we do df.<particulat_col>.cumsum() it will produce cumsun of that, in our case as follows with a new desired column along:

>>> df
   item        Date supply  demand  new_supply
0     A  2018-01-01      -      10          10
1     A  2018-01-02      -      15          25
2     A  2018-01-03    100      30          55
3     A  2018-01-04      -      10          65
4     A  2018-01-05      -      40         105
5     A  2018-01-06     50      50         155
6     A  2018-01-07      -      10         165
7     B  2018-01-01      -      20         185
8     B  2018-01-02      -      30         215
9     B  2018-01-03     20      60         275
10    B  2018-01-04      -      20         295
11    B  2018-01-05    100      10         305
12    B  2018-01-06      -      23         328
13    B  2018-01-07      -      30         358

If indeed we are looking for a aggregative or cumulative sum of two items , then it should be go along with groupby as follows:

>>> df['new_supply'] = df.groupby('item')['demand'].cumsum()

>>> df
   item        Date supply  demand  new_supply
0     A  2018-01-01      -      10          10
1     A  2018-01-02      -      15          25
2     A  2018-01-03    100      30          55
3     A  2018-01-04      -      10          65
4     A  2018-01-05      -      40         105
5     A  2018-01-06     50      50         155
6     A  2018-01-07      -      10         165
7     B  2018-01-01      -      20          20
8     B  2018-01-02      -      30          50
9     B  2018-01-03     20      60         110
10    B  2018-01-04      -      20         130
11    B  2018-01-05    100      10         140
12    B  2018-01-06      -      23         163
13    B  2018-01-07      -      30         193

Note: In case you need the output you desired then you have to modify your question with logic you are filtering the dataset indeed!

Karn Kumar
  • 8,518
  • 3
  • 27
  • 53