I have a pandas dataframe of about 40k entries in the following format:
invoiceNo | item
import pandas as pd
df = pd.DataFrame({'invoiceNo': ['123', '123', '124', '124'],
'item': ['plant', 'grass', 'hammer', 'screwdriver']})
Let's say a customer can buy several items under one single invoice number.
Is there a way for me to check what items get bought together the most?
The first thing I tried was to get all unique IDs to loop through
unique_invoice_id = df.invoiceNo.unique().tolist()
Thanks!