1

I´m looking to encode a categorical numerical column. I´m seeing a lot of encoding from string-type categorical columns, but not when already transform in numerical.

My goal is to convert this pandas dataframe:

+---------+
|year|hour|
|2018|1|
|2018|3|
|2019|4|
|2019|4|
+------+

To:

+-----------------------------------------+
|year_2018|year_2019|hour_1|hour_3|hour_4|
|1|0|1|0|0|
|1|0|0|1|0|
|0|4|0|0|1|
|0|1|0|0|1|
+---------+
Sociopath
  • 13,068
  • 19
  • 47
  • 75
Lucas Mengual
  • 263
  • 6
  • 21

1 Answers1

0

You need pd.get_dummies

df = pd.DataFrame({'year':[2018,2018,2019,2019], 'hour':[1,3,4,4]})

print(pd.get_dummies(df, columns=['year','hour'], drop_first=False))

Output:

  year_2018  year_2019  hour_1  hour_3  hour_4                                                                                                                   
0          1          0       1       0       0                                                                                                                   
1          1          0       0       1       0                                                                                                                   
2          0          1       0       0       1                                                                                                                   
3          0          1       0       0       1  
Sociopath
  • 13,068
  • 19
  • 47
  • 75
  • 1
    `drop_first=False` is default paramter, so should be omit. Maybe OP need only `print(pd.get_dummies(df))` – jezrael Mar 19 '19 at 10:21
  • 1
    Op may have more columns and just for sake of readbility and better understanding I added `drop_false` parameter in my answer. – Sociopath Mar 19 '19 at 10:22
  • 1
    Perfect! Was not sure the get_dummies was the right one, but works just fine, cuz I can do it in selected columns I got in the dataframe. Thanks! – Lucas Mengual Mar 19 '19 at 10:43