2

I have a pandas dataframe column that contains list of strings (lengths are different) like below: df['category']:

category                                                                           | ...
---------
['Grocery & Gourmet Food', 'Cooking & Baking', 'Lard & Shortening', 'Shortening']  | ...
['Grocery & Gourmet Food', 'Candy & Chocolate', 'Mints']                           | ...
['Grocery & Gourmet Food', 'Soups, Stocks & Broths', 'Broths', 'Chicken']          | ...

Now, I want to break this category column into different columns for each string element in the list. Is it possible to do using pandas? How I am gonna handle the column names?

I have gone through the answers of this question, but the difference is my list lengths are not the same always.

My expected output would be something like below:

category_1             | category_2       |  category_n  | other_columns 
------------------------------------------------------------------
Grocery & Gourmet Food | Cooking & Baking | Lard & Shortening | ...
...                    | ...              | ...               | ...
sksoumik
  • 845
  • 1
  • 9
  • 23
  • Will you please add a sample of your expected output? –  Dec 26 '21 at 17:21
  • for example [this answer](https://stackoverflow.com/a/57498593/10197418) to the linked question works well with lists of different sizes, column names would be auto-generated. – FObersteiner Dec 26 '21 at 17:25

1 Answers1

2

I would do something like this:

df2 = pd.DataFrame(df['category'].to_list(), columns=[f"category_{i+1}" for i in range(len(df['category'].max()))])
df = pd.concat([df.drop('category', axis=1), df2], axis=1)

Output:

               category_1              category_2         category_3  \
0  Grocery & Gourmet Food        Cooking & Baking  Lard & Shortening   
1  Grocery & Gourmet Food       Candy & Chocolate              Mints   
2  Grocery & Gourmet Food  Soups, Stocks & Broths             Broths   

   category_4  
0  Shortening  
1        None  
2     Chicken 

Edit:

As @mozway suggested, it is better to create the columns with their default names and then update them:

df2 = pd.DataFrame(df['category'].to_list())
df2.columns = df2.columns.map(lambda x: f'category_{x+1}')
df = pd.concat([df.drop('category', axis=1), df2], axis=1)
SpicyPhoenix
  • 308
  • 1
  • 8
  • 1
    Better create first `df2` with default column names, then update with `df2.columns = df2.columns.map(lambda x: f'category_{x+1}')` – mozway Dec 26 '21 at 17:52
  • @mozway True, that's actually better. Thanks, I edited my answer. – SpicyPhoenix Dec 26 '21 at 18:04
  • @SpicyPhoenix This edited answer creates another column named `category_1` which is just another copy of the `category` column. – sksoumik Dec 26 '21 at 18:26
  • @sksoumik That's weird. I've just tested it myself and the output is the same as the one I posted. https://i.imgur.com/hpa7R0v.png – SpicyPhoenix Dec 26 '21 at 19:11
  • @SpicyPhoenix, sorry. My category column's items were actually `str` not list. That's why it was not working. I converted it to list using `ast`. Now, it's working. Thanks – sksoumik Dec 26 '21 at 20:06