17

EDIT:
If you're coming to this question and your string looks like 1996-Q1, then just use pd.to_datetime(df['Quarter']) to convert it to a proper pandas datetime. This question is about solving all the quarter dates that are not in this standard format.

ORIGINAL QUESTION:
I'm looking for a nice, readable and understandable way (one that you can remember for the next time) to convert Q3 1996 to a pandas datetime, for example 1996-07-01 in this case. Until now I found this, but it's mighty ugly:

df = pd.DataFrame({'Quarter':['Q3 1996', 'Q4 1996', 'Q1 1997']})
​
df['date'] = (
    pd.to_datetime(
        df['Quarter'].str.split(' ').apply(lambda x: ''.join(x[::-1]))
))
​
print(df)
   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

I was hoping the following would work, because it's readable, but unfortunately it doesn't:

df['date'] = pd.to_datetime(df['Quarter'], format='%q %Y')

The problem is also that quarter and year are apparently in the wrong order for pandas to do simple processing.

Can anyone help me find a cleaner way of converting Q3 1996 to a pandas datetime?

Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96

3 Answers3

28

You can (and should) use pd.PeriodIndex as a first step, then convert to timestamp using PeriodIndex.to_timestamp:

qs = df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-\1')
qs

0    1996-Q3
1    1996-Q4
2    1997-Q1
Name: Quarter, dtype: object

df['date'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()
df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

The initial replace step is necessary as PeriodIndex expects your periods in the %Y-%q format.


Another option is to use pd.to_datetime after performing string replacement in the same way as before.

df['date'] = pd.to_datetime(
    df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-\1'), errors='coerce')
df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01

If performance is important, you can split and join, but you can do it cleanly:

df['date'] = pd.to_datetime([
    '-'.join(x.split()[::-1]) for x in df['Quarter']])

df

   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Both answers are great. Why does pd.to_datetime() need to have Year and Quarter switched around for it to work? – Sander van den Oord Dec 22 '18 at 19:15
  • @SandervandenOord I think it happens to do with the underlying datetime parser being used (`pytz` if I'm not mistaken). But I am not sure. There is no way I'm aware of to specify a format for the PeriodIndex, but it would be nice if you could. – cs95 Dec 22 '18 at 19:16
  • How can I get dates corresponding to the end of the quarter? Like, Q1 2018 turns into 2018-03-31? – ifly6 Feb 07 '19 at 19:02
  • 2
    @ifly6 Had the same question, just put `to_timestamp(how='end')` – User2321 Nov 07 '19 at 15:51
  • 1
    @cs95 First solution does not seem to be working anymore: df['date'] = pd.PeriodIndex(qs, freq='Q') I'm getting 'Incorrect dtype'. Using pandas 0.25.3 and pytz 2019.2 Do you have an idea why this is not working anymore? Or am I making a mistake? – Sander van den Oord Nov 26 '19 at 12:03
  • @SandervandenOord there might be something up with qs, I'd need more information. Otherwise, I'll be on my machine in a few hours so if you can wait, I can check it out for you. – cs95 Nov 26 '19 at 15:03
  • @cs95 I am not in a hurry, just curious why a well appreciated solution of you is partly not working anymore (if I'm not mistaken). – Sander van den Oord Nov 27 '19 at 09:08
  • It seems to me, I need one additional Q before \1 to work properly: `df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-Q\1')` otherwise \1 will be converted to the given month. At least with pd.to_datetime. pandas 1.3.0 – Arpad Horvath -- Слава Україні Jul 19 '21 at 19:32
11

Given a quarter format like 2018-Q1, one can use the built in pd.to_datetime function. As a general answer would have to deal with the plethora of ways one can store a quarter-year observation (e.g. 2018:1, 2018:Q1, 20181, Q1:2018, etc.), coercing the data into the format supra is outside of my answer's scope.

But given a formatted series:

formatted_series = formatted_series_supplier() ...
df['date'] = pd.to_datetime(formatted_series)

For example:

>>> pd.to_datetime(pd.Series(['2018-Q1']))
0   2018-01-01
dtype: datetime64[ns]

And if you're dealing with regulatory data, which almost always reflects the end of the quarter rather than it's start (i.e. instead of 2019-01-01, you want 2019-03-31), you can use offsets like below:

df['date'] = df['date'] + pd.offsets.QuarterEnd(0)

Using the example from above, removing the intermediate result,

>>> pd.to_datetime(pd.Series(['2018-Q1'])) + pd.offsets.QuarterEnd(0)
0   2018-03-31
dtype: datetime64[ns]

Note that the 0 parameter must be provided to QuarterEnd if you want correct indexing within the same quarter for the end date. Otherwise, you'll get something like this:

>>> pd.to_datetime('2018-03-31') + pd.offsets.QuarterEnd()
Timestamp('2018-06-30 00:00:00')
ifly6
  • 5,003
  • 2
  • 24
  • 47
7

Use slicing by last 4 values with first 2 and convert to datetimes:

df['date'] = pd.to_datetime(df['Quarter'].str[-4:] + df['Quarter'].str[:2])

String operations in pandas are slow, so if no missing values is possible use list comprehension:

#python 3.6+ 
df['date'] = pd.to_datetime([f'{x[-4:]}{x[:2]}' for x in df['Quarter']])
#python bellow
#df['date'] = pd.to_datetime(['{}{}'.format(x[-4:], x[:2]) for x in df['Quarter']])
print (df)
   Quarter       date
0  Q3 1996 1996-07-01
1  Q4 1996 1996-10-01
2  Q1 1997 1997-01-01
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252