1

I have been stuck for 3 hours on this problem. I have a DF like that : p = product order = number of sales

I don't have the release date of the product so I assume that the release date is the first date with some sales.

Here is my dataframe :

p order
A  0     
A  0     
A  1     
A  1       
A  2     
B  0     
B  0     
B  1     
B  1     

this is what I would like : an incrementation of days since release on columns d_s_r (days since release).

p order d_s_r
A  0      0
A  0      0
A  1      1
A  1      2  
A  2      3
B  0      0
B  0      0
B  1      1
B  1      2

What would be your recommendation :

I tried :

for i, row in data[data.order > 0].groupby('p') :
    list_rows = row.index.tolist()
    for m, k in enumerate(list_rows):
        data.loc[k,'s_d_r'] = m +1

seems to be working but it takes too much time....

i'm sure there is an easy way but can't find id. thanks in advance...

Edit :

Here's my df : 
df = pd.DataFrame([['A',0,0],['A',0,0],['A',12,1],['A',23,5],['A',25,7]
                ,['B',0,0],['B',2,0],['B',8,5],['B',15,12],['B',0,3],['B',0,3],['B',5,4]], columns=['prod','order','order_2'])

with the df.groupby('prod')['order'].transform(lambda x : x.cumsum().factorize()[0])

I get :



prod    order   order_2 d_s_r
0   A   0   0   0
1   A   0   0   0
2   A   12  1   1
3   A   23  5   2
4   A   25  7   3
5   B   0   0   0
6   B   2   0   1
7   B   8   5   2
8   B   15  12  3
9   B   0   3   3
10  B   0   3   3
11  B   5   4   4

When I would like :


prod    order   order_2 d_s_r
0   A   0   0   0
1   A   0   0   0
2   A   12  1   1
3   A   23  5   2
4   A   25  7   3
5   B   0   0   0
6   B   2   0   1
7   B   8   5   2
8   B   15  12  3
9   B   0   3   4
10  B   0   3   5
11  B   5   4   6

generally have 0's at the beginning of each groupby.('p') but i could eventually have directly some actual values. And I can, have 0 order some day(which put's back the counter to 0 here), but still want my counter since release date of product


I actually managed to get my results by adding a dummy column with only "1" and by doing df[df.o' > 0].groupby('p').cumsum() but I don't think it's really interesting...

MeshellT
  • 13
  • 1
  • 4
  • Does this answer your question? [Python pandas cumsum with reset everytime there is a 0](https://stackoverflow.com/questions/45964740/python-pandas-cumsum-with-reset-everytime-there-is-a-0) – roganjosh Apr 20 '20 at 16:49

1 Answers1

2

groupby on p + cumsum on order with factorize

df['d_s_r'] = df.groupby('p')['order'].cumsum().factorize()[0]
print(df)

   p  order  d_s_r
0  A      0      0
1  A      0      0
2  A      1      1
3  A      1      2
4  A      2      3
5  B      0      0
6  B      0      0
7  B      1      1
8  B      1      2
anky
  • 74,114
  • 11
  • 41
  • 70
  • Hi , I tried your code but it didn't work on my large dataframe and made some funny results. I don't understand the use of cumsum() here ? – MeshellT Apr 20 '20 at 18:37
  • just did... not sure how top edit a question thought... thanks for your help btw – MeshellT Apr 20 '20 at 18:54
  • @MeshellT can you also try `df['d_s_r'] =df.groupby('p')['order'].transform(lambda x: x.cumsum().factorize()[0])` ? Does group `p` matter for your case? , also your input v/s output doesnt match , example you have 2 B in the end in the input vs 1 in the output – anky Apr 20 '20 at 19:15
  • Hi thanks ! So - it is working, but only in the case that I don't have any order equal to 0 to a product. Otherwise it doesn't increment which doesn't fit to my problem. Should i see the problem in a different perspective ? – MeshellT Apr 21 '20 at 07:30
  • @MeshellT yeah if you have a 0 in the middle of a group , that would require a different approach. instead of `lambda x: x.cumsum().factorize()[0]` may be you can try `lambda x : (x.cumsum()-x.cumsum().where(~x).ffill().fillna(0)).factorize()[0]` but it will be better to ask a new question with all possibilities covered in the body of the question :) – anky Apr 21 '20 at 07:35