0

The Problem:

I have a dataframe df that looks like this:

                                  value  msg_type
date        
2022-03-15 08:15:10+00:00         122    None
2022-03-15 08:25:10+00:00         125    None
2022-03-15 08:30:10+00:00         126    None
2022-03-15 08:30:26.542134+00:00  127    ANNOUNCEMENT
2022-03-15 08:35:10+00:00         128    None
2022-03-15 08:40:10+00:00         122    None
2022-03-15 08:45:09+00:00         127    None
2022-03-15 08:50:09+00:00         133    None
2022-03-15 08:55:09+00:00         134    None
....
2022-03-16 09:30:09+00:00         132    None
2022-03-16 09:30:13.234425+00:00  135    ANNOUNCEMENT
2022-03-16 09:35:09+00:00         130    None
2022-03-16 09:40:09+00:00         134    None
2022-03-16 09:45:09+00:00         135    None
2022-03-16 09:50:09+00:00         134    None

Expected output:

                                  value  msg_type      time_since_ann
date        
2022-03-15 08:15:10+00:00         122    None          -15
2022-03-15 08:25:10+00:00         125    None          -10
2022-03-15 08:30:10+00:00         126    None          -5
2022-03-15 08:30:26.542134+00:00  127    ANNOUNCEMENT  0
2022-03-15 08:35:10+00:00         128    None          5 
2022-03-15 08:40:10+00:00         122    None          10
2022-03-15 08:45:09+00:00         127    None          15
2022-03-15 08:50:09+00:00         133    None          20
2022-03-15 08:55:09+00:00         134    None          25
....
2022-03-16 09:30:09+00:00         132    None          -5
2022-03-16 09:30:13.234425+00:00  135    ANNOUNCEMENT  0
2022-03-16 09:35:09+00:00         130    None          5
2022-03-16 09:40:09+00:00         134    None          10
2022-03-16 09:45:09+00:00         135    None          15
2022-03-16 09:50:09+00:00         134    None          20

I want to create a new column time_since_announcement that populates with a value from -120 to +480 in 5 minute intervals, where the 0 value aligns with the announcement. This is to enable me to eventually group by the time_since_announcement and generate a percentile/quantile plot (not sure which terminology is correct) that shows value on the y axis and time on the x-axis ranging from -2 hours to +8 hours. So for example if I have 5+ announcements, I want them all time-aligned at t=0 on the plot to see how value changes over time.


What I've Tried:

  • this post talks about grouping data into 5 minute intervals, which I can do, but it doesn't address my problem of having them aligned at the same time_since_announcement value of 0 for each msg_type == 'ANNOUNCEMENT'.
  • this post talks about adding a column to a df with a repeating series of values, which is exactly what I want! I tried this code:
df['minutes_since_announcement']= 0

time_range = np.arange(-120, 480, 5)

np.put(df['minutes_since_announcement'], np.arange(len(df)), time_range)

but I get TypeError: argument 1 must be numpy.ndarray, not Series.

How can I pass in argument 1 as an ndarray? The original post doesn't appear to have that problem.

Or, if anyone has better ideas about how to accomplish this, please let me know! Any advice would be greatly appreciated, thank you so much!

walker967
  • 136
  • 1
  • 10

1 Answers1

1

You can actually just assign directly to the dataframe column rather than using np.put:

df['minutes_since_announcement'] = time_range[np.arange(len(df))]

Output:

>>> df
                                  value      msg_type  minutes_since_announcement
date                                                                             
2022-03-15 08:15:10+00:00           122          None                        -120
2022-03-15 08:25:10+00:00           125          None                        -115
2022-03-15 08:30:10+00:00           126          None                        -110
2022-03-15 08:30:26.542134+00:00    127  ANNOUNCEMENT                        -105
2022-03-15 08:35:10+00:00           128          None                        -100
2022-03-15 08:40:10+00:00           122          None                         -95
2022-03-15 08:45:09+00:00           127          None                         -90
2022-03-15 08:50:09+00:00           133          None                         -85
2022-03-15 08:55:09+00:00           134          None                         -80
2022-03-16 09:30:09+00:00           132          None                         -75
2022-03-16 09:30:13.234425+00:00    135  ANNOUNCEMENT                         -70
2022-03-16 09:35:09+00:00           130          None                         -65
2022-03-16 09:40:09+00:00           134          None                         -60
2022-03-16 09:45:09+00:00           135          None                         -55
2022-03-16 09:50:09+00:00           134          None                         -50