How to fill na based on other value in the same group by Python

Question

I want to fill NA in my dataset based on the value from the other row in the same group.

The data looks like this.

 group_id, start_time, end_time  
    1, NA, 20000
    1, 40000, 20000
    1, 30000, NA
    2, NA, 35000
    2, 45000, 22000
    2, 50000,21000
    2, 45000, NA

I want to get this result:

 group_id, start_time, end_time  
        1, 30000, 20000
        1, 40000, 20000
        1, 30000, 20000
        2, 45000, 35000
        2, 45000, 22000
        2, 50000,21000
        2, 45000, 35000

So the first and last value on each group are same on both start_time, and end_time .

So you want to fill it in with the group minimum for start_time and the group maximum for end_time? — Dan, May 28 '20 at 10:57
So the value of start_time and end_time in both first and last row of group are the same. As can be seen that the first NA in group _id 1 is 30000 , which is based on the third row. — nne, May 28 '20 at 11:01
Hi, welcome to SO! Please, take a look at [how-to-ask](https://stackoverflow.com/help/how-to-ask). It will be useful for you and the community. Tell us what you have tried so far. — Francesco, May 28 '20 at 11:11
Does this answer your question? [Pandas: filling missing values by mean in each group](https://stackoverflow.com/questions/19966018/pandas-filling-missing-values-by-mean-in-each-group) — Dan, May 28 '20 at 11:22

Dan · Accepted Answer · 2020-05-28T11:16:40.883

1

You can do this using fillna, groupby, tranform and the first or last aggregation functions, as explained in this answer

df['start_time'] = df['start_time'].fillna(df.groupby('group_id')['start_time'].transform('last'))
df['end_time'] = df['end_time'].fillna(df.groupby('group_id')['end_time'].transform('first'))

edited May 28 '20 at 11:16

answered May 28 '20 at 11:06

Dan

45,079
17
88
157

anjsimmo · Answer 2 · 2020-05-28T11:27:23.130

You can use numpy.where to select a value from one column (of the default values) if the condition for that row (is.na) matches, else select values from some other column (in this case, the same column).

import pandas as pd
import numpy as np
from io import StringIO

TESTDATA = """
group_id,start_time,end_time
1,NA,20000
1,40000,20000
1,30000,NA
2,NA,35000
2,45000,22000
2,50000,21000
2,45000,NA
"""
df = pd.read_csv(StringIO(TESTDATA), sep=",") # parse your sample data

In this case, the default values you asked for are the first/last of each group, so we will need to construct columns for these:

start_time_last = df.groupby("group_id")["start_time"].last()
end_time_first = df.groupby("group_id")["end_time"].first()
merged = df.join(start_time_last, on='group_id', how='left', rsuffix='_last').join(end_time_first, on='group_id', how='left', rsuffix='_first')

Now we can use np.where to fill in the na values:

merged["start_time"] = np.where(pd.isna(merged["start_time"]), merged["start_time_last"], merged["start_time"])
merged["end_time"] = np.where(pd.isna(merged["end_time"]), merged["end_time_first"], merged["end_time"])
print(merged)

Gives:

   group_id  start_time  end_time  start_time_last  end_time_first
0         1     30000.0   20000.0          30000.0         20000.0
1         1     40000.0   20000.0          30000.0         20000.0
2         1     30000.0   20000.0          30000.0         20000.0
3         2     45000.0   35000.0          45000.0         35000.0
4         2     45000.0   22000.0          45000.0         35000.0
5         2     50000.0   21000.0          45000.0         35000.0
6         2     45000.0   35000.0          45000.0         35000.0

You can use `transform` to avoid needing to do all that joining. Once you have that, you can just use fillna to fill in the values. — Dan, May 28 '20 at 11:21
@Dan Didn't notice your answer until after I posted, but your solution using `transform` is much more elegant! — anjsimmo, May 28 '20 at 11:25

score -1 · Answer 3 · answered May 28 '20 at 10:55

-1

You could make a loop to iterate over that and if the value is NA change it

answered May 28 '20 at 10:55

Tiago Oliveira

47
1
12

How to fill na based on other value in the same group by Python

3 Answers3