My data is STRAVA activities in a dataframe with the index set as the date of the activity. I want to insert rows that are indexed with the date that is missing i.e. my dataframe would be indexed at a frequency of days from oldest to newest in the original data. I have tried the following methods from two other posts here Add missing dates to pandas dataframe and here pandas fill missing dates in time series
However the issue I run into is as follows. Because on some dates two activities occur the index label for the row no longer is unique and is duplicated.
2020-12-20 8.00000
2020-12-20 7.00000
2020-12-22 6.50000
2020-12-22 9.00000
2020-12-23 4.00000
2020-12-24 3.00000
So when I run df.reindex(index=pd.date_range(df.index.min(),df.index.max(), fill=0, I get the dates inserted but lose the duplicates. Similarly with df.index.asfreq("D"). I get the same issue.
The only solution I have found is using df.align() as parsing in a series with index labels at frequencies of days as below. Then I return the first dataframe in the tuple producing the following result.
dates = pd.date_range(df.index.min(), df.index.max(), freq = "D")
ts = pd.Series(np.random.randint(0,500), len(dates)), index=dates)
tuple = df.align(ts, axis=0)
newdf = tuple[0]
2020-12-20 8.00000
2020-12-20 7.00000
2020-12-20 NaN
2020-12-21 NaN
2020-12-22 6.50000
2020-12-22 9.00000
2020-12-22 NaN
2020-12-23 4.00000
2020-12-23 NaN
2020-12-24 3.00000
2020-12-24 NaN
Is this the only solution? Is there a way to ignore duplicate indexes but continue to insert rows with dates that are MISSING and therefore not end up with even more duplicate date indexs?