I have two series, one of which is monthly CPI (consumer price inflation) data and the other being the daily close price of EUR/USD exchange rate. The issue I am having is converting the monthly CPI data INTO DAILY so that I can combine the two series together into a dataframe. As I am using this for a regression machine learning (ML) task, namely XGBoost, I am unsure about the correct way to do this. I've read about a couple of methods: maybe interpolation, e.g. Chow-Lin or perhaps Cubic Spline, I've heard of using a Kalman Filter, I have seen people manipulating the datetime index and filling it with NaN values and then using ffill() to fill in the NaN's, and there's training separate models (which I don't want to do) etc.....
I really don't know the correct procedure to do this, especially when time disaggregation is of a major concern in relation to model accuracy. Here is the code:
from datetime import datetime
import pandas as pd
import pandas_datareader.data as pdr
import yfinance as yf
eurusd = yf.download("EURUSD=X", start=datetime(2000, 1, 1), end=datetime(2021, 7, 21))["Close"] # daily
cpi = pdr.FredReader("CPALTT01USM657N", start=datetime(2000, 1, 1), end=datetime(2021, 7, 21)).read() # monthly
The output of the data is as follows:
(N.B: I will eventually slice the dataframe and have data from around 2008 onwards for the sake of feature engineering and for more recent data for the ML model)
Date
2003-12-01 1.196501
2003-12-02 1.208897
2003-12-03 1.212298
2003-12-04 1.208094
2003-12-05 1.218695
...
2021-07-15 1.183334
2021-07-16 1.181181
2021-07-19 1.181401
2021-07-20 1.179384
2021-07-21 1.178411
Name: Close, Length: 4554, dtype: float64
CPALTT01USM657N
DATE
2000-01-01 0.297089
2000-02-01 0.592417
2000-03-01 0.824499
2000-04-01 0.058411
2000-05-01 0.116754
... ...
2021-01-01 0.425378
2021-02-01 0.547438
2021-03-01 0.708327
2021-04-01 0.821891
2021-05-01 0.801711
[257 rows x 1 columns]
Really appreciate all the help that I can get! Many thanks.