Suppose I have a Pandas dataframe with 'Date'
column whose values have gaps like below:
>>> import pandas as pd
>>> data = [['2021-01-02', 1.0], ['2021-01-05', 2.0], ['2021-02-05', 3.0]]
>>> df = pd.DataFrame(data, columns=['Date','$'])
>>> df
Date $
0 2021-01-02 1.0
1 2021-01-05 2.0
2 2021-02-05 3.0
I would like to fill the gaps in the 'Date'
column from the period between Jan 01, 2021 to Feb 28, 2021 while copying (forward-filling) the values, so from some reading up on StackOverflow posts like this, I came up with this solution to transform the dataframe as shown below:
# I need to first convert values in 'Date' column to datetime64 type
>>> df['Date'] = pd.to_datetime(df['Date'])
# Then I have to set 'Date' column as the dataframe's index
>>> df = df.set_index(['Date'])
# Without doing the above two steps, the call below returns error
>>> df_new=df.asfreq(freq='D', how={'start':'2021-01-01', 'end':'2021-03-31'}, method='ffill')
>>> df_new
$
Date
2021-01-02 1.0
2021-01-03 1.0
2021-01-04 1.0
2021-01-05 2.0
2021-01-06 2.0
2021-01-07 2.0
2021-01-08 2.0
2021-01-09 2.0
2021-01-10 2.0
...
2021-01-31 2.0
2021-02-01 2.0
2021-02-02 2.0
2021-02-03 2.0
2021-02-04 2.0
2021-02-05 3.0
But as you can see above, the dates in df_new
only starts at '2021-01-02'
instead of '2021-01-01'
AND it ends on '2021-02-05'
instead of '2021-02-28'
. I hope I'm entering the input for how
parameter correctly above.
Q1: What else do I need to do to make the resulting dataframe look like below:
>>> df_new
$
Date
2021-01-01 1.0
2021-01-02 1.0
2021-01-03 1.0
2021-01-04 1.0
2021-01-05 2.0
2021-01-06 2.0
2021-01-07 2.0
2021-01-08 2.0
2021-01-09 2.0
2021-01-10 2.0
...
2021-01-31 2.0
2021-02-01 2.0
2021-02-02 2.0
2021-02-03 2.0
2021-02-04 2.0
2021-02-05 3.0
2021-02-06 3.0
...
2021-02-28 3.0
Q2: Is there any way I can accomplish this simpler (i.e. without having to set the 'Date'
column as the index of the dataframe for example)
Thanks in advance for your suggestions/answers!