I have a DataFrame like this.
order_id product_id add_to_cart_order reordered user_id eval_set order_number order_dow order_hour_of_day days_since_prior_order
0 1 49302 1 1 112108 train 4 4 10 9.0
1 1 11109 2 1 112108 train 4 4 10 9.0
2 1 10246 3 0 112108 train 4 4 10 9.0
3 1 49683 4 0 112108 train 4 4 10 9.0
4 1 43633 5 1 112108 train 4 4 10 9.0
5 1 13176 6 0 112108 train 4 4 10 9.0
6 1 47209 7 0 112108 train 4 4 10 9.0
7 1 22035 8 1 112108 train 4 4 10 9.0
8 36 39612 1 0 79431 train 23 6 18 30.0
9 36 19660 2 1 79431 train 23 6 18 30.0
10 36 49235 3 0 79431 train 23 6 18 30.0
11 36 43086 4 1 79431 train 23 6 18 30.0
12 36 46620 5 1 79431 train 23 6 18 30.0
13 36 34497 6 1 79431 train 23 6 18 30.0
14 36 48679 7 1 79431 train 23 6 18 30.0
15 36 46979 8 1 79431 train 23 6 18 30.0
16 38 11913 1 0 42756 train 6 6 16 24.0
17 38 18159 2 0 42756 train 6 6 16 24.0
So all rows tagged order_id == 1, all are corresponding to 1 order. I would like to figure out how many unique orders a user has made. The outcome should look like:
user_id unique_orders
112108 1
79431 1
42756 1
It's obvious that I should groupBy order_id
. But what is the agg here? Do I need to do two groupBys? What is the best way to do this?