3

How to convert a set of categories into a DataFrame?

For example:

A = [{'a', 'c'}, {'a', 'b'}, {'b', 'd'}, {'e'}]

To:

    'a', 'b', 'c', 'd', 'e'
1    1 ,  0 ,  1 ,  0 ,  0 
2    1 ,  1 ,  0 ,  0 ,  0 
3    0 ,  1 ,  0 ,  1 ,  0 
4    0 ,  0 ,  0 ,  0 ,  1  

3 Answers3

5

Let's try explode then crosstab:

s = pd.Series(A).explode()
pd.crosstab(s.index, s)

Output:

col_0  a  b  c  d  e
row_0               
0      1  0  1  0  0
1      1  1  0  0  0
2      0  1  0  1  0
3      0  0  0  0  1

Option 2: get_dummies on the explode:

pd.get_dummies(pd.Series(A).explode()).sum(level=0)

Output:

   a  b  c  d  e
0  1  0  1  0  0
1  1  1  0  0  0
2  0  1  0  1  0
3  0  0  0  0  1
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

You can convert the individual entries into a string, convert to a Series type, then apply str.get_dummies to get the result:

pd.Series(["|".join(entry) for entry in A]).str.get_dummies()

    a   b   c   d   e
0   1   0   1   0   0
1   1   1   0   0   0
2   0   1   0   1   0
3   0   0   0   0   1
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
0
df = pd.get_dummies(pd.DataFrame(A), prefix = ['', ''])
       .groupby(lambda x: x.strip(r'\_'), axis = 1)
       .sum()

Result is

   a  b  c  d  e
0  1  0  1  0  0
1  1  1  0  0  0
2  0  1  0  1  0
3  0  0  0  0  1
Alex Savitsky
  • 2,306
  • 5
  • 24
  • 30
wwnde
  • 26,119
  • 6
  • 18
  • 32