-1

I need to convert categorical values to column names and fill with zeros and ones.

x = pd.DataFrame({'province' : ['Ontario', 'Manitoba', 'Quebec'], 'species' : ['a', 'b', 'c']})

   province species
0   Ontario       a
1  Manitoba       b
2    Quebec       c

I want to reshape the data frame above so that the values in species turn into column names, and the values of the new columns indicate presence or absence. The new data frame should look like this:

   province  a  b  c
0   Ontario  1  0  0
1  Manitoba  0  1  0
2    Quebec  0  0  1
NBK
  • 887
  • 9
  • 20
  • `x = pd.get_dummies(x, columns=['species'])` like [this answer](https://stackoverflow.com/a/36285489) or `x = pd.get_dummies(x, columns=['species'], prefix='', prefix_sep='')` for exact output. – Henry Ecker Jan 15 '22 at 21:45
  • @Henry, I didn't find an option not to add the prefix though – mozway Jan 15 '22 at 21:46
  • @mozway The second option with `, prefix='', prefix_sep='')` works fine no? Like [here](https://stackoverflow.com/a/62902495/15497888) – Henry Ecker Jan 15 '22 at 21:46
  • You're right, I had tried False for some reason – mozway Jan 15 '22 at 21:47

1 Answers1

0

You can use crosstab:

(pd.crosstab(x['province'], x['species'])
   .reset_index().rename_axis(None, axis=1)
)

output:

   province  a  b  c
0  Manitoba  0  1  0
1   Ontario  1  0  0
2    Quebec  0  0  1

NB. crosstab will give you the number of found values, so if you have duplicates you can have 2/3/etc.

or get_dummies:

pd.get_dummies(x, columns=['species'], prefix='', prefix_sep='')

output:

   province  a  b  c
0   Ontario  1  0  0
1  Manitoba  0  1  0
2    Quebec  0  0  1
mozway
  • 194,879
  • 13
  • 39
  • 75