1

What's the best way to process values in a specific set of Dataframe columns only if it's not null?

my original code:

for i in columns_to_process:
    df[i] = df[i].astype(str) + '!'

#columns_to_process is a list of selected df column name strings

I realised that this will turn null values into nan! but I just want to keep them nulls.

I've looked into using .apply() and a lambda function but that's bad practice when working on individual columns apparently, according to: Pandas: How can I use the apply() function for a single column? . Seems like .apply() is more for updating every column in the dataframe.

Then I came across this: replace non Null values in column by 1 and managed to make this working solution:

for i in columns_to_process:
    df.loc[df[i].notnull(), i] = df.loc[df[i].notnull(), i].astype(str) + '!'

It took me a long time to figure this out and doesn't look very elegant, so I'm wondering if there is a better / more pythonic way to do this?

Rory LM
  • 160
  • 2
  • 15

2 Answers2

2

IIUC, try pandas.DataFrame.where:

# Sample df
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
                            'two'],
                   'bar': ['A', 'B', np.nan, 'A', 'B', 'C'],
                   'baz': [1, 2, np.nan, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

columns_to_process = ['bar', 'baz']
df[columns_to_process] = df[columns_to_process].where(df[columns_to_process].isna(), lambda x: x.astype(str)+'!')
df

Output:

   bar    baz  foo zoo
0  A!!  1.0!!  one   x
1  B!!  2.0!!  one   y
2  NaN    NaN  one   z
3  A!!  4.0!!  two   q
4  B!!  5.0!!  two   w
5  C!!  6.0!!  two   t
Rory LM
  • 160
  • 2
  • 15
Chris
  • 29,127
  • 3
  • 28
  • 51
  • Thanks, looks promising. But how would I apply this to specific columns only, instead of the whole DataFrame Like my example with `columns_to_process` above? – Rory LM Jan 10 '20 at 02:44
  • @RoryLM I've edited the answer. Let me know if it works :) – Chris Jan 10 '20 at 03:53
  • Cheers for fast response. Works now I've made minor change (see edit). Do you know if `.where()` has much of an advantage over my original method with `.loc()`? – Rory LM Jan 10 '20 at 05:24
1

Use DataFrame.where with DataFrame.add

df.astype(str).add('!').where(df.notnull())
#to fill NaN with the previous null values
#df.astype(str).add('!').where(df.notnull(),df) 
ansev
  • 30,322
  • 5
  • 17
  • 31