I need a fast approach to add rows for all the dates between two columns. I have already tried the following approaches:
df.index.repete: add rows for all dates between two columns?
pd.melt: How can I add rows for all dates between two columns?
date_range + merge: Get all dates between startdate and enddate columns
itertuples: Add rows to dataframe each day between a range of two columns
SQL script using sqlite: Crashed the machine when trying to do a join between two dates.
When they worked they were all very slow (between 25 to 75 minutes), I have 1,2 million rows in my dataframe which extends to over 30 million. Is there anyway to do any of these solutions using something like apply or vectorization? Is there a way of doing this in just a few minutes max?
Data has following structure:
start_date end_date ID
The final structure should be:
start_date end_date Date ID