I am working with sequential and frequent pattern mining. I was given this type of dataset to do the task, and I am told to make a sequence from the dataset before processing.
This is the sample data taken from dataset, in table format. The table in .csv format is available at: https://drive.google.com/file/d/1j1rEy4Q600y_oym23cG3m3NNWuNvIcgG/view?usp=sharing
User | Item 1 | Item 2 | Item 3 | Item 4 | Item 5 | Item 6 |
---|---|---|---|---|---|---|
A | milk | cake | citrus | |||
B | cheese | milk | bread | cabbage | carrot | |
A | tea | juice | citrus | salmon | ||
B | apple | orange | ||||
B | cake |
At first, I think I have to make the csv file into Pandas Dataframe. I have no problem with that, what I want to ask is, how is it possible with dataframe to produce result like this?
Expected result 1, a group of items bought from 1 user is grouped into one tuple
User | Transactions |
---|---|
A | (milk cake citrus)(tea juice citrus salmon) |
B | (cheese milk bread cabbage carrot)(apple orange)(cake) |
Expected result 2, each item purchased by user is not grouped by one.
User | Transactions |
---|---|
A | milk, cake, citrus, tea, juice, citrus, salmon, |
B | cheese, milk, bread, cabbage, carrot, apple, orange, cake |
My question is, how to make those dataframe? I've tried a solution from this article: How to group dataframe rows into list in pandas groupby, but it is still not successful.