0

I have a DataFrame like this.

         order_id  product_id  add_to_cart_order  reordered  user_id eval_set  order_number  order_dow  order_hour_of_day  days_since_prior_order
0               1       49302                  1          1   112108    train             4          4                 10                     9.0
1               1       11109                  2          1   112108    train             4          4                 10                     9.0
2               1       10246                  3          0   112108    train             4          4                 10                     9.0
3               1       49683                  4          0   112108    train             4          4                 10                     9.0
4               1       43633                  5          1   112108    train             4          4                 10                     9.0
5               1       13176                  6          0   112108    train             4          4                 10                     9.0
6               1       47209                  7          0   112108    train             4          4                 10                     9.0
7               1       22035                  8          1   112108    train             4          4                 10                     9.0
8              36       39612                  1          0    79431    train            23          6                 18                    30.0
9              36       19660                  2          1    79431    train            23          6                 18                    30.0
10             36       49235                  3          0    79431    train            23          6                 18                    30.0
11             36       43086                  4          1    79431    train            23          6                 18                    30.0
12             36       46620                  5          1    79431    train            23          6                 18                    30.0
13             36       34497                  6          1    79431    train            23          6                 18                    30.0
14             36       48679                  7          1    79431    train            23          6                 18                    30.0
15             36       46979                  8          1    79431    train            23          6                 18                    30.0
16             38       11913                  1          0    42756    train             6          6                 16                    24.0
17             38       18159                  2          0    42756    train             6          6                 16                    24.0

So all rows tagged order_id == 1, all are corresponding to 1 order. I would like to figure out how many unique orders a user has made. The outcome should look like:

user_id   unique_orders
 112108               1
  79431               1
  42756               1

It's obvious that I should groupBy order_id. But what is the agg here? Do I need to do two groupBys? What is the best way to do this?

Jay Parthasarthy
  • 327
  • 1
  • 4
  • 11

2 Answers2

0

as suggested @Quang Hoang in the comment, this question is of direct application of GroupBy.nunique, I add what is necessary to obtain a dataframe like the one shown in your expected output:

df_nunique=df.groupby('user_id')['order_id'].nunique().rename('unique_orders').reset_index()
print(df_nunique)

   user_id  unique_orders
0    42756              1
1    79431              1
2   112108              1
ansev
  • 30,322
  • 5
  • 17
  • 31
0

You can try with the following code. It seems that len(unique()) performs better than nunique()

print(df.groupby('user_id')['order_id'].apply(lambda x: len(x.unique())))
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53