0

I am very new to Python/pandas and quite direly need some pointers on how to proceed with data manipulation. I have a dataframe that is structured like this:

Name         Books                     Cars              ......

Sally      ["A", "B", "C"]         ["A", "P", "G", "E"]
Bob        ["C", "D"]              ["P", "L", "M"]
Ryan       ["A", "C", "D", "Z"]        NaN

There are over 1000 columns. What I want is something that looks like this:

Name      A   B   C   D   E   Z   P   G   L   M

Sally.    2.  1.  1.  0.  1.  0.  1.  1.  0.  0
Bob.       etc...
Ryan

where the numbers represent the frequency of the elements in the aggregate lists corresponding to the individual.

I think that my general approach should be:

  1. Explode ALL columns (but I don't know how to do this all at once). I have tried to use lambda like so:
df.apply(lambda x: pd.Series.explode)

But still quite lost on how to apply the explode to all columns at once.

  1. Use a function to count the frequency of each entry after exploded.
  2. Arrange the frequency counts to the corresponding column and individual.

Any advice is appreciated on how I should go about creating this. Thank you!

  • 1
    Hi.. Welcome to SO, please have a tour on how to create a [`reproducible-pandas-examples`](https://stackoverflow.com/q/20109391/4985099) – sushanth Nov 06 '20 at 12:13

1 Answers1

0

I would convert the dataframe to a list and do the operation.

This code is not computationally efficient, but I think it should do the job.

df_lst = df.values.tolist()

unique_name_set = {}
for row in df_lst:
    for name_lst in row[1:]:
        unique_name_set.update(name_lst)

temp_dict = dict.fromkeys(dict.fromkeys(["Name"] + list(unique_name_set)), [])

for row in df_lst:
    person_name = row[0]
    for unique_name in unique_name_set:
        freq = 0
        for name_lst in row[1:]:
            freq += name_lst.count(unique_name)
        temp_dict['unique name'].append(freq)

 df = pd.DataFrame.from_dict(temp_dict)

        
    
DKDK
  • 302
  • 2
  • 10