Split a string of category into specific Dataframe columns

Question

I have a Dataframe column with following category:

    data = {'People': ['John','Mary','Andy','April'], 
             'Class': ['Math, Science','English, Math, Science','Math, Science','Science, English, Math']}
    
    df = pd.DataFrame(data, columns = ['People', 'Class'])

How may I create new columns and transform the Dataframe into:

> | People | Math | Science | English |
> ------------------------------------- 
> | John   | Math | Science |         | 
> | Mary   | Math | Science | English | 
> | Andy   | Math | Science |         |
> | April  | Math | Science | English |

Does this answer your question? [How to split a column into two columns?](https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns) — Trenton McKinney, Aug 25 '20 at 18:40

score 1 · Answer 1 · answered Aug 25 '20 at 18:20

Following code may help you

columns = set([x for lst in df['Class'] for x in lst.replace(" ", "").split(",") ])
for col in columns:
  df[col] = ""*len(df)

for i, val in enumerate(df["Class"]):
  cl = val.replace(" ", "").split(",")
  print(cl)
  for value in cl:
    df.loc[i][value] = value
df.drop('Class', axis=1, inplace=True)

Output:

    People  Science English Math
0   John    Science         Math
1   Mary    Science English Math
2   Andy    Science         Math
3   April   Science English Math

score 1 · Answer 2 · answered Aug 25 '20 at 18:32

Here is a solution,

# Strip-out white spaces before `,\s+`, use dummies to create categorical variable

df = df.set_index('People')

dummies = (
    df.Class.str.replace(',\s+', ",", regex=True)
        .str.get_dummies(sep=",")
)

   English  Math  Science
0        0     1        1
1        1     1        1
2        0     1        1
3        1     1        1

# Create a "hash map" to substitute categorical data
replace_ = {i : j for i, j in enumerate(dummies.columns, 1)}

# multiply keys with & replace to fill in the column values.
dummies.mul(list(replace_.keys())).replace(replace_)

        English  Math  Science
People                        
John          0  Math  Science
Mary    English  Math  Science
Andy          0  Math  Science
April   English  Math  Science

Trenton McKinney · Accepted Answer · 2020-08-25T19:18:20.430

Use .get_dummies to get a table of 1 and 0 for the Class column
Use np.where to replace 1 with the column name, and 0 with an empty string.
df.Class.str.get_dummies(', ').apply(lambda x: np.where(x == 1, x.name, '')) creates a separate dataframe, which we use .join to combine back to df.
.drop the Class column, which is not needed.

import pandas as pd
import numpy as np

updated = df.join(df.Class.str.get_dummies(', ').apply(lambda x: np.where(x == 1, x.name, ''))).drop(columns=['Class'])

# display(updated)
  People  English  Math  Science
0   John           Math  Science
1   Mary  English  Math  Science
2   Andy           Math  Science
3  April  English  Math  Science

good use fo dummies :) – Suryaveer Singh Aug 27 '20 at 19:13 — Suryaveer Singh, Aug 27 '20 at 19:13

Split a string of category into specific Dataframe columns

3 Answers3