6

Assuming I have following data set

lst = ['u', 'v', 'w', 'x', 'y']
lst_rev = list(reversed(lst))
dct = dict(zip(lst, lst_rev))

df = pd.DataFrame({'A':['a', 'b', 'a', 'c', 'a'],
                   'B':lst},
                   dtype='category')

Now I want to replace the value of column B in df by dct

I know I can do

df.B.map(dct).fillna(df.B)

to get the expected out put , but when I test with replace (which is more straightforward base on my thinking ), I failed

The out put show as below

df.B.replace(dct)
Out[132]: 
0    u
1    v
2    w
3    v
4    u
Name: B, dtype: object

Which is different from the

df.B.map(dct).fillna(df.B)
Out[133]: 
0    y
1    x
2    w
3    v
4    u
Name: B, dtype: object

I can think that the reason why this happen, But why ?

0    u --> change to y then change to u
1    v --> change to x then change to v
2    w
3    v
4    u

Appreciate your help.

BENY
  • 317,841
  • 20
  • 164
  • 234

2 Answers2

6

It's because replace keeps applying the dictionary

df.B.replace({'u': 'v', 'v': 'w', 'w': 'x', 'x': 'y', 'y': 'Hello'})

0    Hello
1    Hello
2    Hello
3    Hello
4    Hello
Name: B, dtype: object

With the given dct 'u' -> 'y' then 'y' -> 'u'.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
5

This behavior is not intended, and was recognized as a bug.

This is the Github issue that first identified the behavior, and it was added as a milestone for pandas 0.24.0. I can confirm the replacement works as expected in the current version on Github.

Here is the PR containing the fix.

user3483203
  • 50,081
  • 9
  • 65
  • 94