3

I have a dataframe like this one

df = pd.DataFrame({'week': [1, 1, 1, 2, 2, 2, 3, 3, 3],
                   'p':  list(range(1, 4)) * 3,
                    'q': [4, 2, 1, 6, 2, 1, 6, 3, 2]})
df



   week p   q
0   1   1   4
1   1   2   2
2   1   3   1
3   2   1   6
4   2   2   2
5   2   3   1
6   3   1   6
7   3   2   3
8   3   3   2

In this dataframe, p identifies a product (1, 2, and 3) and q is the quantity sold in a week. I need to create two variables with within week information. The first one should have the ratio of q relative to that of the product with p=3. The second one should have the ratio relative to product 2 (for product 1) and relative to product 3 (for product 2). The desired output should be:

   week p   q   d1  d2
0   1   1   4   4.0 2.0
1   1   2   2   2.0 2.0
2   1   3   1   1.0 1.0
3   2   1   6   6.0 3.0
4   2   2   2   2.0 2.0
5   2   3   1   1.0 1.0
6   3   1   6   3.0 2.0
7   3   2   3   1.5 1.5
8   3   3   2   1.0 1.0

This answer is related and it allows me to create the first variable (I would need to replace the NaN with ones for product p = 3).

df['d1']=np.nan
df['d1']=df.loc[df.p!=3,'d1'].\
                fillna(df.q/df.groupby('week').q.transform('last'))
df

   week p   q   d1
0   1   1   4   4.0
1   1   2   2   2.0
2   1   3   1   NaN
3   2   1   6   6.0
4   2   2   2   2.0
5   2   3   1   NaN
6   3   1   6   3.0
7   3   2   3   1.5
8   3   3   2   NaN

Two things remain that I haven't been able to figure out. First, if instead of defining the first variable relative to p=3, I were to define it relative to p=2, how would I modify the code above? (or what alternative would be better?)

Second, how can I generate the second variable? (that has the value of q relative to that of p=2 for p=1, and relative to that of p=3 for p=2).

flucoe
  • 263
  • 2
  • 11
  • so for `d1` you want to divide `df.q` by the last week value of each product except excluding product 3 ? and the same for d2 but excluding p 2 ? – Umar.H Jun 12 '20 at 22:43
  • 1
    No. For `d1` I want to divide df.q by that associated with product 3 (it happens that in this example it is the last one but it need not). For `d2` I want to divide df.q of product 1 by that of product 2, and df.q of product 2 by that of product 3. – flucoe Jun 12 '20 at 22:52

1 Answers1

3

Try this:

df['d1'] = df.q / df.q.where(df.p.eq(3)).groupby(df.week).transform('first')
df['d2'] = df.q / df.groupby('week').q.shift(-1).fillna(df.q)

Out[74]:
   week  p  q   d1   d2
0     1  1  4  4.0  2.0
1     1  2  2  2.0  2.0
2     1  3  1  1.0  1.0
3     2  1  6  6.0  3.0
4     2  2  2  2.0  2.0
5     2  3  1  1.0  1.0
6     3  1  6  3.0  2.0
7     3  2  3  1.5  1.5
8     3  3  2  1.0  1.0
Andy L.
  • 24,909
  • 4
  • 17
  • 29