0

My dataset has Customer_Code, As_Of_Date and 24 products. The products have a value of 0 -1. I ordered the data set by customer code and as_of_date. I want to subtract from the next row in the products to the previous row. The important thing here is to get each customer out according to their as_of_date.

I try

df2.set_index('Customer_Code').diff()

and

df2.set_index('As_Of_Date').diff()

and

for i in new["Customer_Code"].unique():
df14 = df12.set_index('As_Of_Date').diff()

but is not true. My code is true for first customer but it is not true for second customer. How I can do?

  • no data as images - include as text in your post. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Rob Raymond Mar 01 '21 at 05:28

1 Answers1

0

You didn't share any data so I made up something that you may use. Your expected outcome also lacks. For further reference, please do not share images. Let's say you have this data:

  id        date   product 
0    12  2008-01-01         1
1    12  2008-01-01         2
2    12  2008-01-01         1
3    12  2008-01-02         4
4    12  2008-01-02         5
5    34  2009-01-01         6
6    34  2009-01-01         7
7    34  2009-01-01        84
8    34  2009-01-02         4
9    34  2009-01-02         3
10   34  2009-01-02         3
11   34  2009-01-03         5
12   34  2009-01-03         6
13   34  2009-01-03         8

As I understand it, you want to substract the product value from the previous row, grouped by id and date. (if any other group, adapt). You then need to do this:

mask = df.duplicated(['id', 'date'])
df['product_diff'] = (np.where(mask, (df['product'] - df['product'].shift(1)), np.nan))

which returns:

    id        date   product   product_diff
0    12  2008-01-01         1           NaN
1    12  2008-01-01         2           1.0
2    12  2008-01-01         1          -1.0
3    12  2008-01-02         4           NaN
4    12  2008-01-02         5           1.0
5    34  2009-01-01         6           NaN
6    34  2009-01-01         7           1.0
7    34  2009-01-01        84          77.0
8    34  2009-01-02         4           NaN
9    34  2009-01-02         3          -1.0
10   34  2009-01-02         3           0.0
11   34  2009-01-03         5           NaN
12   34  2009-01-03         6           1.0
13   34  2009-01-03         8           2.0

or if you want it the other way around:

mask = df.duplicated(['id', 'date'])

df['product_diff'] = (np.where(mask, (df['product'] -  df['product'].shift(-1)), np.nan))

which gives:

   id        date   product   product_diff
0    12  2008-01-01         1           NaN
1    12  2008-01-01         2           1.0
2    12  2008-01-01         1          -3.0
3    12  2008-01-02         4           NaN
4    12  2008-01-02         5          -1.0
5    34  2009-01-01         6           NaN
6    34  2009-01-01         7         -77.0
7    34  2009-01-01        84          80.0
8    34  2009-01-02         4           NaN
9    34  2009-01-02         3           0.0
10   34  2009-01-02         3          -2.0
11   34  2009-01-03         5           NaN
12   34  2009-01-03         6          -2.0
13   34  2009-01-03         8           NaN