0

I have a time series(df1) of passenger boarding count data with a DateTimeindex like below:

                     P_Boarded
Timestamp   
2019-06-12 05:01:12  2.0
2019-06-12 05:30:22  NaN
2019-06-12 06:02:10  6.0
2019-06-12 06:32:54  5.0
2019-06-12 07:03:12  8.0
2019-06-12 07:10:16  NaN
2019-06-12 07:21:04  16.0
2019-06-12 07:30:08  NaN
2019-06-12 07:41:45  24.0
2019-06-12 07:53:24  28.0
2019-06-12 08:01:01  35.0
2019-06-12 08:10:36  32.0
2019-06-12 08:20:17  41.0
2019-06-12 08:31:12  NaN
2019-06-12 08:42:42  NaN
2019-06-12 08:50:19  45.0
2019-06-12 09:04:06  37.0
2019-06-12 09:09:13  NaN
2019-06-12 09:20:58  NaN
2019-06-12 09:31:48  NaN
2019-06-12 09:43:56  36.0
2019-06-12 09:52:12  NaN
2019-06-12 10:02:35  NaN
2019-06-12 10:11:12  42.0
2019-06-12 10:22:23  NaN
...

I have resampled df1 with 1 Hour frequency and found the median of each resampled group by using the following code:

f = df1.P_Boarded.resample('1H').median().round()
f

After resampling the data looks like:

Timestamp
2019-06-12 05:00:00     2.0
2019-06-12 06:00:00     6.0
2019-06-12 07:00:00    20.0
2019-06-12 08:00:00    38.0
2019-06-12 09:00:00    36.0
2019-06-12 10:00:00    22.0
2019-06-12 11:00:00    15.0
2019-06-12 12:00:00    18.0
2019-06-12 13:00:00    20.0
2019-06-12 14:00:00    17.0
2019-06-12 15:00:00     9.0
2019-06-12 16:00:00    32.0
2019-06-12 17:00:00    28.0
2019-06-12 18:00:00    29.0
2019-06-12 19:00:00    30.0
2019-06-12 20:00:00    26.0
2019-06-12 21:00:00    14.0
2019-06-12 22:00:00    12.0
2019-06-12 23:00:00     9.0
2019-06-13 00:00:00     2.0
2019-06-13 01:00:00     2.0

Now I want to group 'P_Boarded' based on the resampled groups and fill NAs of every group by its group median.
Like I want that all NAs between 5am to 6am be filled with 2, all NAs between 6am to 7am be filled with 6, and so on and so forth.

How can I do this in python? Please help as I am very new in python and just starting out.

tawab_shakeel
  • 3,701
  • 10
  • 26
Prachi
  • 494
  • 3
  • 8
  • 21
  • 1
    IIUC: `df.groupby(df.index.strftime('%Y-%m-%d %H')).transform(lambda x: x.fillna(x.median()))`, you shouldn't even need the resample – user3483203 Jul 01 '19 at 16:29
  • Just replace `mean` with `median` and the dupe has you covered. I demonstrate the proper `groupby` in my previous comment – user3483203 Jul 01 '19 at 16:31
  • Thankyou @user3483203 for your reply. It helped me a lot. What if I had two columns in df1 'P_Boarded' and 'P_Alighted' and I had to perform the same operation on both. Could it be done with just a single line of code? – Prachi Jul 01 '19 at 17:08

0 Answers0