Nested ifelse alternative in pandas

Question

Suppose we've got a test dataset:

value   group
123 1
120 1
NA  1
130 1
23  2
22  2
24  2
NA  2

Now we want to replace missing values with group-wise median values. In R we can do it using a nested ifelse call.

first.med <- median(test[test$group == 1, ]$value, na.rm = T)
second.med <- median(test[test$group == 2, ]$value, na.rm = T)

test$value <- ifelse(is.na(test$value) & test$group == 1, first.med
                     ifelse(is.na(test$value) & test$group == 2, second.med,
                            test$value))

I though about applying the numpy.where function or the pandas.DataFrame.Set.map method as showcased here, but both techniques do not support nesting. I can think of a list comprehension to do this, but I wish to know if there is an alternative in the realm of NumPy/pandas. Thank you in advance.

why do you need to do nested if-else statements? for example, in r this is generally bad practice, and you can simply do `with(test, ave(value, group, FUN = function(x) {x[is.na(x)] <- median(x, na.rm = TRUE); x}))` which will work for n groups — rawr, Aug 30 '15 at 21:46
You need to learn about [groupby /"Split-Apply-Combine"](http://pandas.pydata.org/pandas-docs/stable/groupby.html). It's a very powerful paradigm. Also much cleaner code. Per @chrisb's answer — smci, Aug 30 '15 at 22:18
@rawr I'm simply used to `ifelse`. Your snippet does look a lot better. — Eli Korvigo, Aug 31 '15 at 08:11

score 3 · Accepted Answer · answered Aug 30 '15 at 22:11

In this case, you can use a groupby to fill by the group median:

In [16]: df.groupby('group')['value'].apply(lambda x: x.fillna(x.median()))
Out[16]: 
0    123
1    120
2    123
3    130
4     23
5     22
6     24
7     23
dtype: float64

Although in general, both of those methods can be nested just fine. E.g., you could do:

In [23]: medians = df.groupby('group')['value'].median()

In [24]: np.where(pd.isnull(df['value']), 
           np.where(df['group'] == 1, medians.loc[1], medians.loc[2]),    
           df['value'])
Out[24]: array([ 123.,  120.,  123.,  130.,   23.,   22.,   24.,   23.])

Nader Hisham · Answer 2 · 2015-08-31T06:00:41.003

1

df = pd.DataFrame({'value' : [123,120,np.nan ,130,23 ,22 ,24 ,np.nan] , 'group' : [1 , 1 ,1 , 1 , 2 , 2 , 2 , 2] })

def replace_with_median(df):
    df['value'][pd.isnull(df['value'])] = df['value'].median()
    return df

df.groupby('group').apply(replace_with_median)

edited Aug 31 '15 at 06:00

answered Aug 31 '15 at 05:47

Nader Hisham

5,214
4
19
35

I don't see how this is aware of the group. – Eli Korvigo Aug 31 '15 at 06:02
`df.groupby('group').apply(replace_with_median)` here it's grouped by group column – Nader Hisham Aug 31 '15 at 06:04

Nested ifelse alternative in pandas

2 Answers2