Pandas: Conditionally replace values based on other columns values

Question

I have a dataframe (df) that looks like this:

                    environment     event   
time                    
2017-04-28 13:08:22     NaN         add_rd  
2017-04-28 08:58:40     NaN         add_rd  
2017-05-03 07:59:35     test        add_env
2017-05-03 08:05:14     prod        add_env
...

Now my goal is for each add_rd in the event column, the associated NaN-value in the environment column should be replaced with a string RD.

                    environment     event   
time                    
2017-04-28 13:08:22     RD          add_rd  
2017-04-28 08:58:40     RD          add_rd  
2017-05-03 07:59:35     test        add_env
2017-05-03 08:05:14     prod        add_env
...

What I did so far

I stumbled across df['environment'] = df['environment].fillna('RD') which replaces every NaN (which is not what I am looking for), pd.isnull(df['environment']) which is detecting missing values and np.where(df['environment'], x,y) which seems to be what I want but isn't working. Furthermore did I try this:

import pandas as pd

for env in df['environment']:
    if pd.isnull(env) and df['event'] == 'add_rd':
        env = 'RD'

The indexes are missing or some kind of iterator to access the equivalent value in the event column.
And I tried this:

df['environment'] = np.where(pd.isnull(df['environment']), df['environment'] = 'RD', df['environment'])

SyntaxError: keyword can't be an expression

which obviously didn't worked.

I took a look at several questions but couldn't build on the suggestions in the answers. Black's question Simon's question szli's question Jan Willems Tulp's question

So, how do I replace a value in a column based on another columns values?

Use `df.loc[df['environment'].isnull() & df['event'].eq('add_rd'), 'environment'] = 'RD'` — Zero, Sep 07 '18 at 14:04
Your last attempt is very close - I think you just need to change it to `df['environment'] = np.where(pd.isnull(df['environment']), 'RD', df['environment'])` — asongtoruin, Sep 07 '18 at 14:27

score 8 · Accepted Answer · answered Sep 07 '18 at 14:51

Now my goal is for each add_rd in the event column, the associated NaN-value in the environment column should be replaced with a string RD.

As per @Zero's comment, use pd.DataFrame.loc and Boolean indexing:

df.loc[df['event'].eq('add_rd') & df['environment'].isnull(), 'environment'] = 'RD'

CT Zhu · Answer 2 · 2018-09-07T23:45:40.760

5

You could consider using where:

df.environment.where((~df.environment.isnull()) & (df.event != 'add_rd'),
                     'RD', inplace=True)

If the condition is not met, the values is replaced by the second element.

edited Sep 07 '18 at 23:45

answered Sep 07 '18 at 14:37

CT Zhu

52,648
17
120
133

score 2 · Answer 3 · answered Nov 03 '21 at 21:21

Replace values in specific column using DataFrame.loc

In [1]: import pandas as pd

In [2]: dictionary = {'time': ['2017-04-28 13:08:22', '2017-04-28 08:58:40', 
                               '2017-05-03 07:59:35','2017-05-03 08:05:14'],
                       'environment': ['NaN', 'NaN', 'test', 'prod'], 
                       'event': ['add_rd', 'add_rd', 'add_env', 'add_env']
                     }

In [3]: df = pd.DataFrame(dictionary, columns= ['time', 'environment', 'event'])
        print(df) 
        
Out [3]:                  time environment    event
         0  2017-04-28 13:08:22         NaN   add_rd
         1  2017-04-28 08:58:40         NaN   add_rd
         2  2017-05-03 07:59:35        test  add_env
         3  2017-05-03 08:05:14        prod  add_env

In [4]: df.loc[df['event'] == 'add_rd', 'environment'] = 'RD'
        print(df) 
        
Out [4]:                  time environment    event
         0  2017-04-28 13:08:22          RD   add_rd
         1  2017-04-28 08:58:40          RD   add_rd
         2  2017-05-03 07:59:35        test  add_env
         3  2017-05-03 08:05:14        prod  add_env

score 0 · Answer 4 · answered Sep 07 '18 at 14:39

if you want to replace just 'add_rd' with 'RD', this can be useful to you

keys_to_replace = {'add_rd':'RD','add_env':'simple'}
df['environment'] = df.groupby(['event'])['environment'].fillna(keys_to_replace['add_rd'])
df

output:

    environment event
0   RD          add_rd
1   RD          add_rd
2   test        add_env
3   prod        add_env

if you have many values to replace based on event, then you may need to follow groupby with 'event' column values

keys_to_replace = {'add_rd':'RD','add_env':'simple'}
temp = df.groupby(['event']).apply(lambda x:  x['environment'].fillna(keys_to_replace[x['event'].values[0]]))
temp.index = temp.index.droplevel(0)
df['environment'] = temp.sort_index().values

output:

   environment  event
0   RD          add_rd
1   RD          add_rd
2   test        add_env
3   prod        add_env

No need for all this. I know the question has already been answered but this could have been done with a simple fillna — Herc01, Dec 15 '19 at 09:15

score -1 · Answer 5 · answered Dec 15 '19 at 09:17

-1

Here it is:

 df['environment']=df['environment'].fillna('RD')

answered Dec 15 '19 at 09:17

Herc01

610
1
8
17

"I stumbled across df['environment'] = df['environment].fillna('RD') which replaces every NaN (which is not what I am looking for)," Thanks for your contribution, but read the question pls. I already found this one – Martin Müsli Jan 01 '20 at 18:36

Pandas: Conditionally replace values based on other columns values

5 Answers5

Linked