Pandas / Python - Replacing values with a calculated Median according to specific hour

Question

my dataframe looks like this:

        Date        Name    Speed   DOW     Year    m   d   hour
64603   06/02/18    NB_MID  NaN     Tuesday 2018    2   6   0
64604   06/02/18    NB_MID  NaN     Tuesday 2018    2   6   0
64608   06/02/18    NB_MID  43.600  Tuesday 2018    2   6   0
64610   06/02/18    NB_MID  4.630   Tuesday 2018    2   6   0
64616   06/02/18    NB_MID  24.633  Tuesday 2018    2   6   0
... ... ... ... ... ... ... ... ...
503754  27/02/18    NB_MID  85.088  Tuesday 2018    2   27  23
503756  27/02/18    NB_MID  15.982  Tuesday 2018    2   27  23
503761  27/02/18    NB_MID  51.725  Tuesday 2018    2   27  23
503762  27/02/18    NB_MID  3.016   Tuesday 2018    2   27  23
503763  27/02/18    NB_MID  3.833   Tuesday 2018    2   27  23

The 'Speed' column has some missing values. To replace the missing SPEED values, I want to use a median calculated for the particular hour of the day when the missing value occurs. This median value would then be used to replace every NaN for that particular hour of the day.

There is 24 hours in the day (0-24) and therefore it makes sense to calculate 24 different median values.

I can use: df_median = df["Speed"].median() to find a median for the whole dataframe. And therefore I could use something like this:

df1 = df.loc[df['hour'] == 1]
df_median = df1["Speed"].median()
df_median

OUTPUT: 40.56

I could keep repeating this code for every hour of the day, but the way I would do it would be to create a new dataframe each time and change the hour each time... like this:

df2 = df.loc[df['hour'] == 2]
df_median2 = df2["Gap"].median()
df_median2

Surely there is a better way to do it? Also, what would then be the best way to replace each NaN with the correct median? For example, when 'hour' column = 1 replace all NaN 'Speed' values with 40.56

I hope I have explained this well enough. I think the way I am doing this could be improved. Thanks for any help.

score 0 · Answer 1 · answered Nov 14 '20 at 04:52

0

I have solved it using this;

How to replace NaN values in Pandas conditionally?

Looks like I did not search hard enough!!!

answered Nov 14 '20 at 04:52

qwerty654321

41
3

Pandas / Python - Replacing values with a calculated Median according to specific hour

1 Answers1