The raw dataset is below:
DF
When the start and end dates differ, we require daily granularity. Daily granularity ensures each row has the same start and end date.
x = np.timedelta64(2069211000000000, 'ns')
days = x.astype('timedelta64[D]')
days / np.timedelta64(1, 'D')
df['datedifference'] = (df['End Date'] - df['Start Date'])
df['datedifference2'] = ((df['datedifference']/days)*23) + 1
df2 = df.loc[df.index.repeat(df['datedifference2'])].reset_index(drop = True)
del df2["datedifference"]
del df2["datedifference2"]
The above snippet results in the correct number of rows per campaign, but the start and end dates repeat.
DF2
This is where I'm stuck. I would like to see results like below:
DF3
Code tried:
date = pd.date_range('2022-03-31', '2022-04-10', freq="AS")
freq = 'D'
date = date + pd.Timedelta(3, unit=freq)
Error Received: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n
, use n * obj.freq
Same error as in this Stack Overflow question
The code referenced in the above stack overflow does not work for my dataset, as my data is not linear.
I've tried so many different snippets, with no luck. I've started considering extracting month, day, and year into separate columns, but this would be way more convoluted than I'd like. Open to suggestions!