I have a dataset of black Friday sales. The columns are User_ID, Product_ID, Gender, Occupation, Product_Category, Purchase, Marital_Status, etc.
After analyzing the data, I found that the attribute User_ID has redundant entries (i.e. Single customer buying multiple goods). The total number of entries is 537377 and after I apply df = df.groupby('User_ID')
, the number of entries is reduced to 5891. I want to extract all the unique rows (i.e. a unique row per customer)from the pandas groupby object. Is there any way to do so?
I tried summing up each purchase amount corresponding to unique User_ID, but that does not help.
df = df.groupby('User_ID')
df['Purchase'].transform('sum')
for key, item in df:
print(df.get_group(key), "\n\n")
After executing the above code, the result I get is:
User_ID Gender Age Occupation City_Category \
0 1000001 F 0-17 10 A
1 1000001 F 0-17 10 A
2 1000001 F 0-17 10 A
3 1000001 F 0-17 10 A
39180 1000001 F 0-17 10 A
4 1000002 M 55+ 16 C
39181 1000002 M 55+ 16 C
39182 1000002 M 55+ 16 C
39183 1000002 M 55+ 16 C
39184 1000002 M 55+ 16 C
78147 1000002 M 55+ 16 C
Product_Category_2 Product_Category_3 Purchase
0 0.0 0.0 8370
1 6.0 14.0 15200
2 0.0 0.0 1422
3 14.0 0.0 1057
39180 4.0 8.0 12842
4 0.0 0.0 7969
39181 17.0 0.0 6187
39182 16.0 0.0 10074
39183 8.0 14.0 5260
39184 16.0 0.0 7927
78147 16.0 0.0 7791
What I actually want is after dropping product_category_2, 3 is with attribute purchase containing the sum of the total money spent
User_ID Gender Age Occupation City_Category Purchase
0 1000001 F 0-17 10 A 38891
1 1000002 M 55+ 16 C 37239