I want to create a unique dataset of fruits. I don't know all the types (e.g. colour store, price) that could be under each fruit. For each type, there could also be duplicate rows. Is there a way to detect all possible duplicates and capture all unique informoation in a fully generalisable way?
type val detail
0 fruit apple
1 colour green greenish
2 colour yellow
3 store walmart usa
4 price 10
5 NaN
6 fruit banana
7 colour yellow
8 fruit pear
9 fruit jackfruit
...
Expected Output
fruit colour store price detail ...
0 apple [green, yellow ] [walmart] [10] [greenish, usa]
1 banana [yellow] NaN NaN
2 pear NaN NaN NaN
3 jackfruit NaN NaN NaN
I tried. But this does not get close to the expected output. It does not show the colum names either.
df.groupby("type")["val"].agg(size=len, set=lambda x: set(x))
0 fruit {"apple",...}
1 colour ...