I'm trying to replace values in a Pandas data frame, based on certain criteria on multiple columns. For a single column criteria this can be done very elegantly with a dictionary (e.g. Remap values in pandas column with a dict):
import pandas as pd
df = pd.DataFrame({'col1': {0:1, 1:1, 2:2}, 'col2': {0:10, 1:20, 2:20}})
rdict = {1:'a', 2:'b'}
df2 = df.replace({"col1": rdict})
Input df
:
col1 col2
0 1 10
1 1 20
2 2 20
Resulting df2
:
col1 col2
0 a 10
1 a 20
2 b 20
I'm trying to extend this to criteria over multiple columns (e.g. where col1==1, col2==10
-> replace). For a single criteria this can be done like:
df3=df.copy()
df3.loc[((df['col1']==1)&(df['col2']==10)), 'col1'] = 'c'
Which results in a df3
:
col1 col2
0 c 10
1 1 20
2 2 20
My real life problem has a large number of criteria, which would involve a large number of df3.loc[((criteria1)&(criteria2)), column] = value
calls, which is far less elegant the the replacement using a dictionary as a "lookup table". Is it possible to extend the elegant solution (df2 = df.replace({"col1": rdict})
) to a setup where values in one column are replaced by criteria based on multiple columns?
An example of what I'm trying to achieve (although in my real life case the number of criteria is a lot larger):
df = pd.DataFrame({'col1': {0:1, 1:1, 2:2, 3:2}, 'col2': {0:10, 1:20, 2:10, 3:20}})
df3=df.copy()
df3.loc[((df['col1']==1)&(df['col2']==10)), 'col1'] = 'a'
df3.loc[((df['col1']==1)&(df['col2']==20)), 'col1'] = 'b'
df3.loc[((df['col1']==2)&(df['col2']==10)), 'col1'] = 'c'
df3.loc[((df['col1']==2)&(df['col2']==20)), 'col1'] = 'd'
Input df
:
0 1 10
1 1 20
2 2 10
3 2 20
Resulting df3
:
col1 col2
0 a 10
1 b 20
2 c 10
3 d 20