I have been trying do an data analyst exercise in R with date about sales. The dataframe is like:
Order_ID Product
<dbl> <chr>
1 319631 34in Ultrawide Monitor
2 319631 Lightning Charging Cable
3 319596 iPhone
4 319596 Lightning Charging Cable
5 319584 iPhone
6 319584 Wired Headphones
7 319556 Google Phone
8 319556 Wired Headphones
And I have to find which products are bought together most often, the Order_ID has duplicates, i.e. they were purchased by the same person.
I did this excercise in Python but I can't do it on R. My Python code was:
pares_compras[['Order ID', 'Product']]
> Order ID Product
2 176560 Google Phone
3 176560 Wired Headphones
17 176574 Google Phone
18 176574 USB-C Charging Cable
29 176585 Bose SoundSport Headphones
pares_compras.groupby('Order ID')['Product'].transform(lambda x: ','.join(x))
>
2 Google Phone,Wired Headphones
3 Google Phone,Wired Headphones
17 Google Phone,USB-C Charging Cable
18 Google Phone,USB-C Charging Cable
pares_compras = pares_compras[['Order ID', 'Grouped Products']].drop_duplicates()
pares_compras
> Order ID Grouped Products
2 176560 Google Phone,Wired Headphones
17 176574 Google Phone,USB-C Charging Cable
29 176585 Bose SoundSport Headphones,Bose SoundSport Hea...
31 176586 AAA Batteries (4-pack),Google Phone
118 176672 Lightning Charging Cable,USB-C Charging Cable
count = Counter()
for row in pares_compras['Grouped Products']:
row_list = row.split(',')
count.update(Counter(combinations(row_list, 2)))
count
> Counter({('Google Phone', 'Wired Headphones'): 414,
('Google Phone', 'USB-C Charging Cable'): 987,
('Bose SoundSport Headphones', 'Bose SoundSport Headphones'): 27, ... )}
for key, num in count.most_common(5):
print(key, num)
>
('iPhone', 'Lightning Charging Cable') 1005
('Google Phone', 'USB-C Charging Cable') 987
('iPhone', 'Wired Headphones') 447
('Google Phone', 'Wired Headphones') 414
('Vareebadd Phone', 'USB-C Charging Cable') 361
So, with that I could solve the exercise but, like I said before, I can't do the same in R, I don't find the way, I've just started in R. If anyone can help me I would really appreciate it, thnaks.