I am very new to Python/pandas and quite direly need some pointers on how to proceed with data manipulation. I have a dataframe that is structured like this:
Name Books Cars ......
Sally ["A", "B", "C"] ["A", "P", "G", "E"]
Bob ["C", "D"] ["P", "L", "M"]
Ryan ["A", "C", "D", "Z"] NaN
There are over 1000 columns. What I want is something that looks like this:
Name A B C D E Z P G L M
Sally. 2. 1. 1. 0. 1. 0. 1. 1. 0. 0
Bob. etc...
Ryan
where the numbers represent the frequency of the elements in the aggregate lists corresponding to the individual.
I think that my general approach should be:
- Explode ALL columns (but I don't know how to do this all at once). I have tried to use lambda like so:
df.apply(lambda x: pd.Series.explode)
But still quite lost on how to apply the explode to all columns at once.
- Use a function to count the frequency of each entry after exploded.
- Arrange the frequency counts to the corresponding column and individual.
Any advice is appreciated on how I should go about creating this. Thank you!