I have a dataframe where one of the columns is item
and there is a non-unique field id
. So first, I'm grouping by id
:
grouped = df.groupby('id')
Now I can iterate each group like so:
for name, group in grouped:
I can also have a list of all unique items with
all_items = df['item'].unique()
What I'd like to do is for each group get a list/vector of size len(all_items)
with counts according to the number of times the item
appeared in the group. Basically, my main goal is to have a numpy matrix of these vectors so I can process it with scikit-learn models.
How can I do that?