1

i have a dataset that uses the format '1998:q1' however this does not recognise as a temporal variable in vega lite. Is there a way to convert this in to the format 1998/01 and repeat this for the whole dataset?

link to dataset

Current format:

Date
1998:Q1
1998:Q2

Desired format:

Date
1998/01
1998/04

tried

df['Date']=pd.to_datetime(df['Date'],format = '%Y%M')
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jan 02 '23 at 18:34

2 Answers2

1

Use PeriodIndex with freq='Q', then strftime:

df['Date'] = pd.PeriodIndex(df['Date'].str.replace(':', '-'), freq='Q').strftime('%Y/%m')

Output:

        Date    Real_HPI    HPI_Rent  HPI_Income
0    1982/03   56.915399   73.676471   91.525857
1    1982/06   57.617379   73.555343   92.184294
2    1982/09   55.792959   70.956878   89.093476
3    1982/12   57.685409   74.935861   92.330045
4    1983/03   58.654319   75.943255   93.027588
..       ...         ...         ...         ...
157  2021/06  110.749697   94.707024   87.563696
158  2021/09  115.405297   99.181098   92.331803
159  2021/12  117.743697  101.504486   95.433863
160  2022/03  120.107097  103.799870   99.858203
161  2022/06  125.751597  108.749527  104.841177

[162 rows x 4 columns]
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Alternative solution, you can use apply() like below -

def fun(var):
    l = var.split(":")
    if l[1] == "Q1":
        return l[0]+"/01"
    elif l[1] == "Q2":
        return l[0]+"/04"
    elif l[1] == "Q3":
        return l[0]+"/07"
    elif l[1] == "Q4":
        return l[0]+"/10"

df['new_date'] = df['Date'].apply(fun)

Sample output -

Sample output