3

I have a dataframe

df = {'Month': [1, 8], 'Year': [2015, 2020]}
df = pd.DataFrame(data = df)
df

Would like to mutate it as a new column.

Desired output:

df = {'Month': [1, 8], 'Year': [2015, 2020], 'Quarter': ['Q12015', 'Q32020']}
df = pd.DataFrame(data = df)
df

Also would be grateful for comments what is the best way to treat this data for modelling with scikit-learn. For now I am thinking about strings and categorical variables without dummy variables, but appreciate any tips here. Thanks!

Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63

2 Answers2

6

the basic idea is to convert to datetime and the convert to_period with Q as Quarter (you can still optimize this I think)

df['Quarter'] = (pd.to_datetime(df[['Month','Year']].astype(str)
                   .agg('-'.join,1).radd("01-"),dayfirst=True).dt.to_period('Q'))

Or a simpler and better way as @Ben.T commented:

df['Quarter'] = pd.to_datetime(df[['Month','Year']].assign(day=1)).dt.to_period('Q')
print(df)

   Month  Year Quarter
0      1  2015  2015Q1
1      8  2020  2020Q3
anky
  • 74,114
  • 11
  • 41
  • 70
2
df['Quarter'] = df[['Year', 'Month']].astype(str).apply('-'.join,1)
df['Quarter'] = pd.PeriodIndex(pd.to_datetime(df['Quarter']), freq ='Q')

Andre
  • 247
  • 2
  • 6