0

I have a data frame consisting of three columns e.g. orderid,time and status. Every order can have many statuses e.g. new, filled, partial, cancelled. so order id 123 can go from new to cancelled etc etc. or it can have new under it 10 times.

My question is after i have done a group by order id and time, how can i find the count of each order status pattern? e..g how many times does New, Filled appear? How many times does New, New, Cancelled appear?

I tried the below, but i just don't know how i can get the result I want.

sortedOrders=OrdersAll.sort_values(['ordid','timestamp'], ascending=[True, True])
sortedOrdersAll.groupby(['ordid','ostatus']).count()

enter image description here

Maths12
  • 852
  • 3
  • 17
  • 31
  • 1
    Can you show us some sample data ? – BENY May 12 '19 at 17:51
  • I can't upload mine since it is sensitive- i have created a small dummy sample but not sure how to upload this – Maths12 May 12 '19 at 18:29
  • @Maths12 press CRTL+K on edit question and paste as code – anky May 12 '19 at 18:39
  • @anky_91 sorry it didn't work- i could only paste as image. The data is something i have created. – Maths12 May 12 '19 at 19:38
  • It seems like your question is similar to finding a subsequence in a list, and if found, then increment. Maybe do something like `df['Status'].values` and along with your specific `Pattern` of `New, New, Filled` try to find how many occurrences there are of `Pattern` in the list. See here: https://stackoverflow.com/questions/3313590/check-for-presence-of-a-sliced-list-in-python or here: https://stackoverflow.com/questions/33392219/how-to-check-subsequence-exists-in-a-list?noredirect=1&lq=1 – Monty May 12 '19 at 20:29

1 Answers1

1

I created a dummy dataframe df. Below you can refer for the logic for getting the count for the status pattern.

In [109]: status = 'new,filled,partial,cancelled'.split(',')
In [102]: df = pd.DataFrame( [ [ random.randint(1,25),  random.randint(100, 200), status[random.randint(0,3)] ] for _ in range(50) ], columns=['order_id','timestamp' ,'status'])

In [103]: df.head(10)
Out[103]:
   order_id  timestamp     status
0        20        120        new
1         9        118  cancelled
2        16        125    partial
3         9        124  cancelled
4         2        190     filled
5         3        185    partial
6         5        162     filled
7        21        101        new
8        25        115     filled
9        14        141     filled

In [104]: df_grouped = df.groupby('order_id', as_index=False)

In [105]: def status_transition_with_timestamp(each_grouped_df):
     ...:     sorted_df = each_grouped_df.sort_values('timestamp', ascending=True)
     ...:     concatenated_transition = ','.join(sorted_df['status'])
     ...:     return concatenated_transition
     ...:

In [106]: result = df_grouped['status'].agg(status_transition_with_timestamp)

In [107]: result.head(10)
Out[107]:
   order_id                       status
0         1                       filled
1         2             filled,cancelled
2         3    partial,cancelled,partial
3         4         filled,new,cancelled
4         5             filled,cancelled
5         6                          new
6         7                       filled
7         9  partial,cancelled,cancelled
8        10                cancelled,new
9        11                  new,partial

In [108]: result.groupby('status').count()
Out[108]:
                                           order_id
status
cancelled,new                                     1
filled                                            4
filled,cancelled                                  2
filled,new,cancelled                              1
filled,partial,partial                            1
new                                               2
new,cancelled                                     2
new,filled                                        1
new,new,filled                                    1
new,new,new,partial,partial,cancelled,new         1
new,partial                                       1
partial                                           1
partial,cancelled,cancelled                       1
partial,cancelled,partial                         1
partial,partial                                   1
partial,partial,new,partial,new                   1

yardstick17
  • 4,322
  • 1
  • 26
  • 33