transform column with categorical data into one column for each category

Question

I have a DataFrame looking like that:

df  index    id           timestamp   cat  value
0   8066     101  2012-03-01 09:00:29  A      1   
1   8067     101  2012-03-01 09:01:15  B      0   
2   8068     101  2012-03-01 09:40:18  C      1
3   8069     102  2012-03-01 09:40:18  C      0

What I want is something like this:

df           timestamp           A     B     C     id      value
0        2012-03-01 09:00:29     1     0     0    101        1
1        2012-03-01 09:01:15     0     1     0    101        0
2        2012-03-01 09:40:18     0     0     1    101        1
3        2012-03-01 09:40:18     0     0     1    102        0

As you can see in rows 2,3 timestamps can be duplicates. At first I tried using pivot (with timestamp as an index), but that didn't work because of those duplicates. I don't want to drop them, since the other data is different and should not be lost.

Since index contains no duplicates, I thought maybe I can pivot over it and after that merge the result into the original DataFrame, but I was wondering if there is an easier more intuitive solution.

Thanks!

score 1 · Answer 1 · answered Feb 03 '15 at 16:51

1

Here is the one-liner that will achieve that you want. Assuming that your dataframe is named df

df_new = df.join(pd.get_dummies(df.cat).drop(['index', 'cat'], axis=1)

answered Feb 03 '15 at 16:51

awhan

510
6
13

score 1 · Answer 2 · answered Feb 03 '15 at 16:57

As your get_dummies returns a df this will be aligned already with your existing df so just concat column-wise:

In [66]:

pd.concat([df,pd.get_dummies(df['cat'])], axis=1)

Out[66]:
   index   id            timestamp cat  value  A  B  C
0   8066  101  2012-03-01 09:00:29   A      1  1  0  0
1   8067  101  2012-03-01 09:01:15   B      0  0  1  0
2   8068  101  2012-03-01 09:40:18   C      1  0  0  1
3   8069  102  2012-03-01 09:40:18   C      0  0  0  1

You can drop the 'cat' column by doing df.drop('cat', axis=1)

score 0 · Accepted Answer · edited May 23 '17 at 12:06

0

Use get_dummies.

See here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.reshape.get_dummies.html

StackOverflow Example here: Create dummies from column with multiple values in pandas

edited May 23 '17 at 12:06

Community

1
1

answered Feb 03 '15 at 16:40

Liam Foley

7,432
2
26
24

Hi, I got the same answer i #pydata just now. Thanks for posting, must have missed it in the documentation. – Daniel Feb 03 '15 at 16:43

transform column with categorical data into one column for each category

3 Answers3