-1

I have a DataFrame looking like that:

df  index    id           timestamp   cat  value
0   8066     101  2012-03-01 09:00:29  A      1   
1   8067     101  2012-03-01 09:01:15  B      0   
2   8068     101  2012-03-01 09:40:18  C      1
3   8069     102  2012-03-01 09:40:18  C      0

What I want is something like this:

df           timestamp           A     B     C     id      value
0        2012-03-01 09:00:29     1     0     0    101        1
1        2012-03-01 09:01:15     0     1     0    101        0
2        2012-03-01 09:40:18     0     0     1    101        1
3        2012-03-01 09:40:18     0     0     1    102        0

As you can see in rows 2,3 timestamps can be duplicates. At first I tried using pivot (with timestamp as an index), but that didn't work because of those duplicates. I don't want to drop them, since the other data is different and should not be lost.

Since index contains no duplicates, I thought maybe I can pivot over it and after that merge the result into the original DataFrame, but I was wondering if there is an easier more intuitive solution.

Thanks!

Daniel
  • 897
  • 6
  • 11

3 Answers3

1

Here is the one-liner that will achieve that you want. Assuming that your dataframe is named df

df_new = df.join(pd.get_dummies(df.cat).drop(['index', 'cat'], axis=1)
awhan
  • 510
  • 6
  • 13
1

As your get_dummies returns a df this will be aligned already with your existing df so just concat column-wise:

In [66]:

pd.concat([df,pd.get_dummies(df['cat'])], axis=1)

Out[66]:
   index   id            timestamp cat  value  A  B  C
0   8066  101  2012-03-01 09:00:29   A      1  1  0  0
1   8067  101  2012-03-01 09:01:15   B      0  0  1  0
2   8068  101  2012-03-01 09:40:18   C      1  0  0  1
3   8069  102  2012-03-01 09:40:18   C      0  0  0  1

You can drop the 'cat' column by doing df.drop('cat', axis=1)

EdChum
  • 376,765
  • 198
  • 813
  • 562
0

Use get_dummies.

See here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.reshape.get_dummies.html

StackOverflow Example here: Create dummies from column with multiple values in pandas

Community
  • 1
  • 1
Liam Foley
  • 7,432
  • 2
  • 26
  • 24
  • Hi, I got the same answer i #pydata just now. Thanks for posting, must have missed it in the documentation. – Daniel Feb 03 '15 at 16:43