1

The raw dataset is below:

DF Image of DF

When the start and end dates differ, we require daily granularity. Daily granularity ensures each row has the same start and end date.

x = np.timedelta64(2069211000000000, 'ns')
days = x.astype('timedelta64[D]')
days / np.timedelta64(1, 'D')
df['datedifference'] = (df['End Date'] - df['Start Date'])
df['datedifference2'] = ((df['datedifference']/days)*23) + 1 

df2 = df.loc[df.index.repeat(df['datedifference2'])].reset_index(drop = True)
del df2["datedifference"]
del df2["datedifference2"]

The above snippet results in the correct number of rows per campaign, but the start and end dates repeat.

DF2 Image of DF2

This is where I'm stuck. I would like to see results like below:

DF3 Image of DF3

Code tried:

date = pd.date_range('2022-03-31', '2022-04-10', freq="AS")   
freq = 'D'                                                   
date = date + pd.Timedelta(3, unit=freq)    

            

Error Received: Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use n * obj.freq

Same error as in this Stack Overflow question

The code referenced in the above stack overflow does not work for my dataset, as my data is not linear.

I've tried so many different snippets, with no luck. I've started considering extracting month, day, and year into separate columns, but this would be way more convoluted than I'd like. Open to suggestions!

not_speshal
  • 22,093
  • 2
  • 15
  • 30

1 Answers1

-1

Try with apply, pd.date_range and explode:

df["Start Date"] = pd.to_datetime(df["Start Date"])
df["End Date"] = pd.to_datetime(df["End Date"])

df["Start Date"] = df.apply(lambda row: pd.date_range(row["Start Date"], row["End Date"]), axis=1)
df["End Date"] = df["Start Date"]
df = df.explode(["Start Date", "End Date"]).reset_index(drop=True)

>>> df
      Campaign Start Date   End Date
0   Campaign A 2022-03-31 2022-03-31
1   Campaign A 2022-04-01 2022-04-01
2   Campaign A 2022-04-02 2022-04-02
3   Campaign A 2022-04-03 2022-04-03
4   Campaign B 2022-03-31 2022-03-31
5   Campaign B 2022-04-01 2022-04-01
6   Campaign B 2022-04-02 2022-04-02
7   Campaign C 2022-04-06 2022-04-06
8   Campaign C 2022-04-07 2022-04-07
9   Campaign C 2022-04-08 2022-04-08
10  Campaign C 2022-04-09 2022-04-09
11  Campaign C 2022-04-10 2022-04-10
not_speshal
  • 22,093
  • 2
  • 15
  • 30