I have a dataframe, as follows:
customer_id order_ts order_nbr item_nbr
162038 04/04/23 18:42 1258 972395
162038 04/04/23 18:42 1258 551984
162038 04/04/23 18:42 1258 488298
162038 04/04/23 18:42 1258 649230
162038 26/02/23 16:28 2715 372225
162038 26/02/23 16:28 2715 911716
162038 26/02/23 16:28 2715 696677
162038 26/02/23 16:28 2715 229455
162038 26/02/23 16:28 2715 870016
162038 29/01/23 13:07 1171 113719
162038 29/01/23 13:07 1171 553461
162060 01/05/23 18:42 1259 300911
162060 01/05/23 18:42 1259 574962
162060 01/05/23 18:42 1259 843300
162060 01/05/23 18:42 1259 173719
162060 05/05/23 18:42 2719 254899
162060 05/05/23 18:42 2719 776553
162060 05/05/23 18:42 2719 244739
162060 05/05/23 18:42 2719 170742
162060 05/05/23 18:42 2719 525719
162060 10/05/23 18:42 1161 896919
162060 10/05/23 18:42 1161 759465
I am interested to create a column "row_num" as follows:
customer_id order_ts order_nbr item_nbr row_num
162038 04/04/23 18:42 1258 972395 1
162038 04/04/23 18:42 1258 551984 1
162038 04/04/23 18:42 1258 488298 1
162038 04/04/23 18:42 1258 649230 1
162038 26/02/23 16:28 2715 372225 2
162038 26/02/23 16:28 2715 911716 2
162038 26/02/23 16:28 2715 696677 2
162038 26/02/23 16:28 2715 229455 2
162038 26/02/23 16:28 2715 870016 2
162038 29/01/23 13:07 1171 113719 3
162038 29/01/23 13:07 1171 553461 3
162060 01/05/23 18:42 1259 300911 1
162060 01/05/23 18:42 1259 574962 1
162060 01/05/23 18:42 1259 843300 1
162060 01/05/23 18:42 1259 173719 1
162060 05/05/23 18:42 2719 254899 2
162060 05/05/23 18:42 2719 776553 2
162060 05/05/23 18:42 2719 244739 2
162060 05/05/23 18:42 2719 170742 2
162060 05/05/23 18:42 2719 525719 2
162060 10/05/23 18:42 1161 896919 3
162060 10/05/23 18:42 1161 759465 3
"row_num" column is basically unique for a particular customer_id, order_ts, order_nbr. For example, customer_id 162038, order_ts = 04/04/23 18:42, order_nbr = 1258, row_num = 1 for all the item numbers.
I did as follows:
windowDept = Window.partitionBy("customer_id","order_ts","order_nbr").orderBy(col("order_ts").desc())
df1 = df.withColumn("row_num",row_number().over(windowDept))
But I am getting this as output:
Any help will be highly appreciated. Thanks.