1

I have looked at many threads but couldn't solve my problem. My df has the following headers:

 Index(['2005 - Q2', '2005 - Q3', '2005 - Q4', '2006 - Q1', '2006 - Q2',
       '2006 - Q3', '2006 - Q4', '2007 - Q1', '2007 - Q2', '2007 - Q3',
       '2007 - Q4', '2008 - Q1', '2008 - Q2', '2008 - Q3', '2008 - Q4',
       '2009 - Q1', '2009 - Q2', '2009 - Q3', '2009 - Q4', '2010 - Q1',
       '2010 - Q2', '2010 - Q3', '2010 - Q4', '2011 - Q1', '2011 - Q2',
       '2011 - Q3', '2011 - Q4', '2012 - Q1', '2012 - Q2', '2012 - Q3',
       '2012 - Q4', '2013 - Q1', '2013 - Q2', '2013 - Q3', '2013 - Q4',
       '2014 - Q1', '2014 - Q2', '2014 - Q3', '2014 - Q4', '2015 - Q1',
       '2015 - Q2', '2015 - Q3', '2015 - Q4', '2016 - Q1', '2016 - Q2',
       '2016 - Q3', '2016 - Q4', '2017 - Q1', '2017 - Q2', '2017 - Q3',
       '2017 - Q4', '2018 - Q1', '2018 - Q2', '2018 - Q3', '2018 - Q4',
       '2019 - Q1', '2019 - Q2', '2019 - Q3', '2019 - Q4', '2020 - Q1',
       '2020 - Q2', '2020 - Q3', '2020 - Q4'],
      dtype='object')

I want to convert these into dates so that I can perform a time series analysis.

THere is one more column before the headers listed in the Index. This first column is of different nature and doesn;t need to be converted to date. How can I exclude the very first header?

Also all these are columns headers that have a value in the rows below (in case this helps).

LockedDown
  • 21
  • 3
  • What does that _"THere is one more column before the headers listed in the Index"_ mean? Do you have a MultiIndex? Or do you mean columns when you speak about the _headers_? If that's the case then do something like `df.columns = [df.columns[0]] + pd.to_datetime(df.columns[1:].str.replace(' ', '')).to_list()`. – Timus Jul 22 '21 at 14:49

1 Answers1

1

You can do this:

ix = pd.to_datetime(
    ser.index.str.replace(' ', '', regex=False), errors='coerce'
)

# or if your format would be more complicated, you might need regexp
ix = pd.to_datetime(
    ser.index.str.replace(r'(\d+) - Q(\d)', r'\1-Q\2', regex=True), errors='coerce'
)

ser2 = pd.Series(
    index=ix,
    data=ser.values
)

I assumed the original series is ser. E.g.

ser = pd.Series(
    index=['2020 - Q2', '2020 - Q3', '2020 - Q4'],
    data=range(3)
)

I used this answer to answer yours.

You might want to use the last day / last business day of the quarter. In that case you can use offsets like QuarterEnd / BQuarterEnd