Rolling Difference using Pandas

Question

Hello I am trying to use Pandas rolling function to calculate a rolling difference on the table below. I am trying to produce the values in the Monthly available items column and not getting anywhere. Please help!

Item    Adds    Subtracts   Month   Monthly Available items
A       71       30          Jan       41
B       19        7          Jan       12
C       68       30          Jan       38
D       34       19          Jan       15
E       180      91          Jan       89
C       58       34          Feb       62
E       222      129         Feb       182
D       26       21          Feb       20
B       65       26          Feb       51
A       62       58          Feb       45

Python code so far:

rolling_triggers['Adds'].rolling(window = 2).apply(lambda x: x[1] - x[0])

This is what i started with but i need it to calculate the difference for each item. So far it looks like its calculating for all items . Also how can i the values in the subtracts column to get the value in the monthly available items? subtracts rolling_triggers['Adds'].rolling(window = 2).apply(lambda x: x[1] - x[0]) — user10297084, Aug 30 '18 at 20:25
So you have a two-month window? If you had March numbers too, you'd only want Monthly Available Items to pay attention to Feb and Mar? — DSM, Aug 30 '18 at 20:28
i have months that go all the way to August, this is just a sample of the data. — user10297084, Aug 30 '18 at 20:41
For the month of Febuary and i 'll use item A as an example. i calculate the Monthly available items as follows: 41+62-58 — user10297084, Aug 30 '18 at 20:43
Does `August` include numbers from Jan, Feb, March, April, May, June and July, or simply the previous month? It's hard to tell if you need a `.rolling(2)` or a simple `.cumsum()`. — ALollz, Aug 30 '18 at 21:07
@ALollz: I tried to extract that answer earlier and failed. :-) — DSM, Aug 30 '18 at 21:13

score 0 · Accepted Answer · answered Aug 30 '18 at 21:35

If I'm understanding your code and comments correctly, this will do what you want:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

df['Month'] = pd.Categorical(df['Month'], categories = months)
df = df.sort_values(['Item', 'Month'])
df['Net_Items'] = df['Adds'] - df['Subtracts']
df['Monthly_Available_Items'] = df.groupby('Item')['Net_Items'].cumsum()
df = df.reset_index(drop = True)

It will return:

  Item  Adds  Subtracts Month  Net_Items  Monthly_Available_Items
0    A    71         30   Jan         41                       41
1    A    62         58   Feb          4                       45
2    B    19          7   Jan         12                       12
3    B    65         26   Feb         39                       51
4    C    68         30   Jan         38                       38
5    C    58         34   Feb         24                       62
6    D    34         19   Jan         15                       15
7    D    26         21   Feb          5                       20
8    E   180         91   Jan         89                       89
9    E   222        129   Feb         93                      182

First, convert the month column to a Categorical (because alphabetically, December is before January, etc). Next, calculate Net_Items as the difference between Adds and Subtracts. Finally calculate a cumulative sum for each of the products.

Assuming January is the first month, the intuition is that there were 71 of Item A added and 30 removed for a net total of 31. In the next month, 62 were added and 58 were removed, so the net total of 4 is added to the prior month's net total of 41 for a running total of 45.

Awesome this worked beautifully when extend to August thank you! — user10297084, Aug 30 '18 at 22:02
I don't think it will make a real difference whether you use an integer to represent the months or a Categorical. — cpander, Aug 30 '18 at 22:02

Rolling Difference using Pandas

1 Answers1

Linked