Python/Pandas - What is the most efficient way to replace values in specifc columns

Question

Suppose you have a data frame

df = pd.DataFrame({'a':[1,2,3,4],'b':[2,4,6,8],'c':[2,4,5,6]})

and you want to replace specific values in columns 'a' and 'c' (but not 'b'). For example, replacing 2 with 20, and 4 with 40.

The following will not work since it is setting values on a copy of a slice of the DataFrame:

df[['a','c']].replace({2:20, 4:40}, inplace=True)

A loop will work:

for col in ['a','c']:
    df[col].replace({2:20, 4:40},inplace=True)

But a loop seems inefficient. Is there a better way to do this?

`df[['a', 'b']] = df[['a', 'b']].apply(lambda x: x.map({2: 20, 4: 40}).fillna(x).astype(int))` - as described in the marked dup. — jpp, Jan 08 '19 at 15:05

score 0 · Accepted Answer · answered Jan 08 '19 at 15:03

According to the documentation on replace, you can specify a dictionary for each column:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [2, 4, 6, 8], 'c': [2, 4, 5, 6]})
lookup =  {col : {2: 20, 4: 40} for col in ['a', 'c']}
df.replace(lookup, inplace=True)
print(df)

Output

    a  b   c
0   1  2  20
1  20  4  40
2   3  6   5
3  40  8   6

Python/Pandas - What is the most efficient way to replace values in specifc columns

1 Answers1