16

I want to replace some values in a column of a dataframe using a dictionary that maps the old codes to the new codes.

di = dict( { "myVar": {11:0, 204:11} } )
mydata.replace( to_replace = di, inplace = True )

But some of the new codes and old codes overlap. When using the .replace method of the dataframe I encounter the error 'Replacement not allowed with overlapping keys and values'

My current workaround is to replace replace the offending keys manually and then apply the dictionary to the remaining non-overlapping cases.

mydata.loc[ mydata.myVar == 11, "myVar" ] = 0 
di = dict( { "myVar": {204:11} } )
mydata.replace( to_replace = di, inplace = True )

Is there a more compact way to do this?

Nirvan
  • 623
  • 7
  • 19
  • I recently came across the same issue. pandas' replace method can't be tricked to accomplish your desired behavior because it sequentially replaces the values you provide. Therefore it might replace already replaced values. Hence you get the replacement-error message.You are probably looking for a true *recode* method in analogy to SPSS like recode. At the moment, I don't think there is pandas builtin way to do this. – pansen Feb 24 '17 at 15:01
  • I found an answer using the .map method on a series. It seems to be working fine so far. I just posted an answer. – Nirvan Feb 24 '17 at 21:05

1 Answers1

19

I found an answer here that uses the .map method on a series in conjunction with a dictionary. Here's an example recoding dictionary with overlapping keys and values.

import pandas as pd
>>> df = pd.DataFrame( [1,2,3,4,1], columns = ['Var'] )
>>> df
   Var
0    1
1    2
2    3
3    4
4    1
>>> dict = {1:2, 2:3, 3:1, 4:3}
>>> df.Var.map( dict )
0    2
1    3
2    1
3    3
4    2
Name: Var, dtype: int64

UPDATE:

With map, every value in the original series must be mapped to a new value. If the mapping dictionary does not contain all the values of the original column, the unmapped values are mapped to NaN.

>>> df = pd.DataFrame( [1,2,3,4,1], columns = ['Var'] )
>>> dict = {1:2, 2:3, 3:1}
>>> df.Var.map( dict )
0    2.0
1    3.0
2    1.0
3    NaN
4    2.0
Name: Var, dtype: float64
Nirvan
  • 623
  • 7
  • 19