-1

I have the following dataset, with multiple IDs, Dates, and other Columns. In one of my columns (Col_X), I have markers 1,2,3, 4, which mark whether a Date is in quarter 1,2,3 or 4, respectively.

ID Dates Col_X Col_Y
2038 2012-01-02 1 InfoY
2038 2012-04-13 2 InfoY
2038 2012-02-27 1 InfoY
2120 2005-08-05 3 InfoY
2120 2009-11-31 4 InfoY

What I want to do is, I want to add 4 new columns (Q1-Q4), one per quarter, that assign 1 or 0 based on the information that I have in Col_X. To put it more simply, in Q1: I want to have 1 if I have 1 in Col_X, else 0; in Q2: I want to have 1 if I have 2 in Col_X, else 0; etc.

I tried the following which saves the marker in "Col_X" based on "Dates", but I can't find a way to iterate through Col_X and do one-hot encoding in separate Q1-Q4 columns.

Any ideas, would be greatly appreciated.

Leonardo
  • 119
  • 10

1 Answers1

2

You can use pandas.get_dummies after conversion to quarter using pandas.to_datetime+dt.quarter:

NB. I changed 2009-11-31 into 2009-11-30 as the first one is not a valid date

df.join(pd.get_dummies(pd.to_datetime(df['Dates']).dt.quarter).add_prefix('Q'))

output:

     ID       Dates  Col_X  Col_Y  Q1  Q2  Q3  Q4
0  2038  2012-01-02      1  InfoY   1   0   0   0
1  2038  2012-04-13      2  InfoY   0   1   0   0
2  2038  2012-02-27      1  InfoY   1   0   0   0
3  2120  2005-08-05      3  InfoY   0   0   1   0
4  2120  2009-11-30      4  InfoY   0   0   0   1

edit: simpler method as there is already column 'Col_X':

df = pd.get_dummies(df, columns=['Col_X'], prefix='Q', prefix_sep='')
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    @Leonardo be aware that some quarters columns might be missing if you don't have all existing quarters in the data, in this case `reindex` the columns before joining ;) – mozway Nov 12 '21 at 12:40
  • 2
    @jezrael :D I had not paid attention that there was a column with the extracted quarters ;) – mozway Nov 12 '21 at 12:55