0

I have two series, one of which is monthly CPI (consumer price inflation) data and the other being the daily close price of EUR/USD exchange rate. The issue I am having is converting the monthly CPI data INTO DAILY so that I can combine the two series together into a dataframe. As I am using this for a regression machine learning (ML) task, namely XGBoost, I am unsure about the correct way to do this. I've read about a couple of methods: maybe interpolation, e.g. Chow-Lin or perhaps Cubic Spline, I've heard of using a Kalman Filter, I have seen people manipulating the datetime index and filling it with NaN values and then using ffill() to fill in the NaN's, and there's training separate models (which I don't want to do) etc.....

I really don't know the correct procedure to do this, especially when time disaggregation is of a major concern in relation to model accuracy. Here is the code:

from datetime import datetime
import pandas as pd
import pandas_datareader.data as pdr
import yfinance as yf

eurusd = yf.download("EURUSD=X", start=datetime(2000, 1, 1), end=datetime(2021, 7, 21))["Close"]  # daily
cpi = pdr.FredReader("CPALTT01USM657N", start=datetime(2000, 1, 1), end=datetime(2021, 7, 21)).read()  # monthly

The output of the data is as follows:

(N.B: I will eventually slice the dataframe and have data from around 2008 onwards for the sake of feature engineering and for more recent data for the ML model)

Date
2003-12-01    1.196501
2003-12-02    1.208897
2003-12-03    1.212298
2003-12-04    1.208094
2003-12-05    1.218695
                ...   
2021-07-15    1.183334
2021-07-16    1.181181
2021-07-19    1.181401
2021-07-20    1.179384
2021-07-21    1.178411
Name: Close, Length: 4554, dtype: float64
            CPALTT01USM657N
DATE                       
2000-01-01         0.297089
2000-02-01         0.592417
2000-03-01         0.824499
2000-04-01         0.058411
2000-05-01         0.116754
...                     ...
2021-01-01         0.425378
2021-02-01         0.547438
2021-03-01         0.708327
2021-04-01         0.821891
2021-05-01         0.801711

[257 rows x 1 columns]

Really appreciate all the help that I can get! Many thanks.

0 Answers0