1

I have this kind of dataframe.

import pandas as pd

df = pd.DataFrame({'year': [1894, 1976, 1995, 2001, 1993]})

The current dataframe

    year
0   1894
1   1976
2   1995
3   2001
4   1993

How can I effectively add one hot encoding columns so that the dataframe would look like this.

The expected dataframe

    year    1800s   1900s   2000s
0   1894      1       0       0
1   1976      0       1       0
2   1995      0       1       0
3   2001      0       0       1
4   1993      0       1       0

I already tried the code below and it worked. But I think there is a better solution, can you recommend me what function can I use ? Thank you!

The code

df['year'] = df['year'].astype(str)

df['1800s'] = df['year'].apply(lambda x: 1 if x[:2] == '18' else 0)
df['1900s'] = df['year'].apply(lambda x: 1 if x[:2] == '19' else 0)
df['2000s'] = df['year'].apply(lambda x: 1 if x[:2] == '20' else 0)
dzakyputra
  • 682
  • 4
  • 16

1 Answers1

2

Use integer division for first 2 digits with get_dummies, rename columns names by DataFrame.add_suffix and last use DataFrame.join for add to original:

df = df.join(pd.get_dummies(df['year'] // 100).add_suffix('00s'))
print (df)
   year  1800s  1900s  2000s
0  1894      1      0      0
1  1976      0      1      0
2  1995      0      1      0
3  2001      0      0      1
4  1993      0      1      0

print (df['year'] // 100)
0    18
1    19
2    19
3    20
4    19
Name: year, dtype: int64

print (pd.get_dummies(df['year'] // 100).add_suffix('00s'))
   1800s  1900s  2000s
0      1      0      0
1      0      1      0
2      0      1      0
3      0      0      1
4      0      1      0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252