The Problem:
I have a dataframe df
that looks like this:
value msg_type
date
2022-03-15 08:15:10+00:00 122 None
2022-03-15 08:25:10+00:00 125 None
2022-03-15 08:30:10+00:00 126 None
2022-03-15 08:30:26.542134+00:00 127 ANNOUNCEMENT
2022-03-15 08:35:10+00:00 128 None
2022-03-15 08:40:10+00:00 122 None
2022-03-15 08:45:09+00:00 127 None
2022-03-15 08:50:09+00:00 133 None
2022-03-15 08:55:09+00:00 134 None
....
2022-03-16 09:30:09+00:00 132 None
2022-03-16 09:30:13.234425+00:00 135 ANNOUNCEMENT
2022-03-16 09:35:09+00:00 130 None
2022-03-16 09:40:09+00:00 134 None
2022-03-16 09:45:09+00:00 135 None
2022-03-16 09:50:09+00:00 134 None
Expected output:
value msg_type time_since_ann
date
2022-03-15 08:15:10+00:00 122 None -15
2022-03-15 08:25:10+00:00 125 None -10
2022-03-15 08:30:10+00:00 126 None -5
2022-03-15 08:30:26.542134+00:00 127 ANNOUNCEMENT 0
2022-03-15 08:35:10+00:00 128 None 5
2022-03-15 08:40:10+00:00 122 None 10
2022-03-15 08:45:09+00:00 127 None 15
2022-03-15 08:50:09+00:00 133 None 20
2022-03-15 08:55:09+00:00 134 None 25
....
2022-03-16 09:30:09+00:00 132 None -5
2022-03-16 09:30:13.234425+00:00 135 ANNOUNCEMENT 0
2022-03-16 09:35:09+00:00 130 None 5
2022-03-16 09:40:09+00:00 134 None 10
2022-03-16 09:45:09+00:00 135 None 15
2022-03-16 09:50:09+00:00 134 None 20
I want to create a new column time_since_announcement
that populates with a value from -120 to +480 in 5 minute intervals, where the 0 value aligns with the announcement. This is to enable me to eventually group by the time_since_announcement
and generate a percentile/quantile plot (not sure which terminology is correct) that shows value
on the y axis and time on the x-axis ranging from -2 hours to +8 hours. So for example if I have 5+ announcements, I want them all time-aligned at t=0 on the plot to see how value
changes over time.
What I've Tried:
- this post talks about grouping data into 5 minute intervals, which I can do, but it doesn't address my problem of having them aligned at the same
time_since_announcement
value of 0 for eachmsg_type == 'ANNOUNCEMENT'
. - this post talks about adding a column to a df with a repeating series of values, which is exactly what I want! I tried this code:
df['minutes_since_announcement']= 0
time_range = np.arange(-120, 480, 5)
np.put(df['minutes_since_announcement'], np.arange(len(df)), time_range)
but I get TypeError: argument 1 must be numpy.ndarray, not Series
.
How can I pass in argument 1 as an ndarray? The original post doesn't appear to have that problem.
Or, if anyone has better ideas about how to accomplish this, please let me know! Any advice would be greatly appreciated, thank you so much!