I'm working on pulling financial data, in which some is formatted in quarterly and other is daily. My model will need all of it daily, therefore I need that same quarterly value repeated for each day. I've been using this stack post and trying to adapt the code to my data.
Here is my dataframe head:
date ticker value
0 31/03/1980 ECB/RA6 1.0
1 30/06/1980 ECB/RA6 4.0
2 30/09/1980 ECB/RA6 2.0
3 30/12/1980 ECB/RA6 3.0
4 31/03/1981 ECB/RA6 2.0
Here is my desired output looks like:
date ticker value
0 01/01/1980 ECB/RA6 1.0
1 02/01/1980 ECB/RA6 1.0
2 03/01/1980 ECB/RA6 1.0
3 04/01/1980 ECB/RA6 1.0
4 05/01/1980 ECB/RA6 1.0
. . . .
. . . .
. . . .
91 01/04/1980 ECB/RA6 4.0
And my code:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df = df.pivot(index='date', columns='ticker')
start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='d')
dates.name = 'date'
df = df.reindex(dates, method='ffill')
df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()
I see now what the problem is, but it shoudn't be a problem i believe. I run the following code from @Pierre D(after removing duplicates):
df = df.set_index('date') # assuming 'date' is a proper Timestamp
df.index = df.index.to_period('Q') # turn index into PeriodIndex('Q')
df = df.set_index('ticker', append=True).squeeze()
df2 = df[df.duplicated( keep = False)]
I get the following output for df2:
value value2
date ticker
1997Q2 AAPL 46850 NaN
1997Q3 AAPL 46850 NaN
2003Q1 MSFT 10137 19/12/2003
2003Q2 MSFT 10137 19/12/2003
As you can see the index is different, but value and value2 are equal in these instances. This should not be a problem I believe, but when I now run:
df.unstack()
I get the following error: "ValueError: Index contains duplicate entries, cannot reshape"
Thanks in advance everyone!