0

I am trying to work out the rolling difference between 2 timeseries DataFrames.

I have 2 data sets, df1 has an inbound timeseries and df2 an outbound series.

        Date    Code    Quantity
0   10/01/2019  A   20
1   10/01/2019  B   12
2   10/01/2019  C   10
3   11/01/2019  A   2
4   11/01/2019  B   30
5   11/01/2019  C   2
6   11/01/2019  D   1
7   12/01/2019  A   4
8   12/01/2019  B   6
9   12/01/2019  D   3
10  12/01/2019  E   2
11  13/01/2019  A   10
12  13/01/2019  B   12
13  13/01/2019  C   1

df2 - Out

    Date     Code   Quantity
0   11/01/2019  A   5
1   11/01/2019  B   1
2   11/01/2019  C   3
3   12/01/2019  A   100
4   12/01/2019  D   2
5   12/01/2019  E   1
6   13/01/2019  B   1
7   13/01/2019  C   1

I am trying to calculate the quantity at the end of each date (df1[quantity] - df2[quantity]), then add this to the quantity from the end of the previous day, with the stipulation that the quantity can not be <0 at any point.

Desired Output

     Date      Code Quantity
0   10/01/2019  A   20
1   10/01/2019  B   12
2   10/01/2019  C   10
3   11/01/2019  A   17
4   11/01/2019  B   41
5   11/01/2019  C   9
6   11/01/2019  D   1
7   12/01/2019  A   0
8   12/01/2019  B   47
9   12/01/2019  D   2
10  12/01/2019  E   1
11  13/01/2019  A   10
12  13/01/2019  B   58
13  13/01/2019  C   9

I think that a function will be the best way to achieve the desired output, but haven't been able to find anything to do this.

CTD91
  • 111
  • 9
  • Please do not include screenshots of data. It should be copyable text. Please see [how to ask a good pandas question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – noah Dec 21 '20 at 22:01
  • You should be able to solve with groupby. There are a lot of examples of using groupby to do very similar problems to this. While waiting for an answer consider giving groupby a whirl – noah Dec 21 '20 at 22:04
  • Hi Noah, I have updated with copyable text rather than the screenshots. I have had a look at the groupby method but couldn't see how I would be able to achieve the running total by day using the previous day as my base using this. – CTD91 Dec 21 '20 at 22:36

1 Answers1

1

I'm not sure if your desired output table is correct or not. I can match part of it but not all of it. You can play around with the code to see if you can reduce the lines but I left some extra lines so you can go through it and see what's happening. It wasn't clear where you wanted quantity to always be >=0, so there's commented line that will do that instead of calculating q1-q2 and ending up with negative numbers. Doing this, I still couldn't match your desired output. There's always another way, but this should give you ideas of where to look and how to optimize the code.

s = """        Date    Code    Quantity
0   10/01/2019  A   20
1   10/01/2019  B   12
2   10/01/2019  C   10
3   11/01/2019  A   2
4   11/01/2019  B   30
5   11/01/2019  C   2
6   11/01/2019  D   1
7   12/01/2019  A   4
8   12/01/2019  B   6
9   12/01/2019  D   3
10  12/01/2019  E   2
11  13/01/2019  A   10
12  13/01/2019  B   12
13  13/01/2019  C   1"""

df1 = pd.read_csv(io.StringIO(s), sep='\s+', engine='python')

s ='''    Date     Code   Quantity
0   11/01/2019  A   5
1   11/01/2019  B   1
2   11/01/2019  C   3
3   12/01/2019  A   100
4   12/01/2019  D   2
5   12/01/2019  E   1
6   13/01/2019  B   1
7   13/01/2019  C   1'''
df2 = pd.read_csv(io.StringIO(s), sep='\s+', engine='python')

df3 = df1.merge(df2, how='outer', left_on=['Date', 'Code'], right_on=['Date', 'Code']).fillna(0)
df3['Qty'] = df3['Quantity_x'] - df3['Quantity_y']
# df3['Qty'] = df3.apply(lambda x: (x['Quantity_x'] - x['Quantity_y']) if (x['Quantity_x'] - x['Quantity_y']) > 0 else 0, axis=1)
df3['Quantity'] = 0
def final_adder(x):
    x.Quantity_x = x.Quantity_x.shift(1, fill_value=0)
    x.Quantity = x.Quantity_x + x.Qty
    # print(x)
    return x
df_final = df3.groupby(['Code']).apply(final_adder)
df_final['Quantity'] = df_final['Quantity'].clip(lower=0)
df_final.drop(['Quantity_x', 'Quantity_y', 'Qty'], inplace=True, axis=1)
print(df_final)

Output

          Date Code  Quantity
0   10/01/2019    A      20.0
1   10/01/2019    B      12.0
2   10/01/2019    C      10.0
3   11/01/2019    A      17.0
4   11/01/2019    B      41.0
5   11/01/2019    C       9.0
6   11/01/2019    D       1.0
7   12/01/2019    A       0.0
8   12/01/2019    B      36.0
9   12/01/2019    D       2.0
10  12/01/2019    E       1.0
11  13/01/2019    A      14.0
12  13/01/2019    B      17.0
13  13/01/2019    C       2.0
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14
  • Hi Jonathan, looking at the output using A as an example it makes sense for the the first 3 dates but the 13/1/19 has In of '10' and out of '0' so I would expect and output of 10 (the 0 closing of 12/1/19 + (In (10) - Out (0)). I am not quite sure what is causing this – CTD91 Dec 22 '20 at 10:52
  • Maybe check the intermediate steps. I may be Adding the wrong dataframe in the last step. You may want to shift and add quantity rather than quantity_x – Jonathan Leon Dec 23 '20 at 00:18