23

In R, there is a rather useful replace function. Essentially, it does conditional re-assignment in a given column of a data frame. It can be used as so: replace(df$column, df$column==1,'Type 1');

What is a good way to achieve the same in pandas?

Should I use a lambda with apply? (If so, how do I get a reference to the given column, as opposed to a whole row).

Should I use np.where on data_frame.values? It seems like I am missing a very obvious thing here.

Any suggestions are appreciated.

ivan-k
  • 811
  • 1
  • 7
  • 20

2 Answers2

31

pandas has a replace method too:

In [25]: df = DataFrame({1: [2,3,4], 2: [3,4,5]})

In [26]: df
Out[26]: 
   1  2
0  2  3
1  3  4
2  4  5

In [27]: df[2]
Out[27]: 
0    3
1    4
2    5
Name: 2

In [28]: df[2].replace(4, 17)
Out[28]: 
0     3
1    17
2     5
Name: 2

In [29]: df[2].replace(4, 17, inplace=True)
Out[29]: 
0     3
1    17
2     5
Name: 2

In [30]: df
Out[30]: 
   1   2
0  2   3
1  3  17
2  4   5

or you could use numpy-style advanced indexing:

In [47]: df[1]
Out[47]: 
0    2
1    3
2    4
Name: 1

In [48]: df[1] == 4
Out[48]: 
0    False
1    False
2     True
Name: 1

In [49]: df[1][df[1] == 4]
Out[49]: 
2    4
Name: 1

In [50]: df[1][df[1] == 4] = 19

In [51]: df
Out[51]: 
    1   2
0   2   3
1   3  17
2  19   5
DSM
  • 342,061
  • 65
  • 592
  • 494
  • It pains me that I did not read the manual attentively enough. – ivan-k Aug 28 '12 at 18:00
  • To be perfectly, honest, I almost never read manuals either, until something really confuses me. But one advantage of using a smart interpreter like IPython is that you can build an object like `df` and then use tab-completion to see what methods live inside it. – DSM Aug 28 '12 at 18:05
  • That is indeed true. iPython is a thing of beauty. In my defence, the replace function is not listed [here](http://pandas.pydata.org/pandas-docs/stable/genindex.html) – ivan-k Aug 28 '12 at 18:36
  • Heh! Maybe my never-read-the-manual policy makes more sense than I thought! :^) – DSM Aug 28 '12 at 18:39
  • It is [here](http://pandas.pydata.org/pandas-docs/stable/missing_data.html#replacing-generic-values) though =P – Chang She Aug 29 '12 at 03:36
9

Pandas doc for replace does not have any examples, so I will give some here. For those coming from an R perspective (like me), replace is basically an all-purpose replacement function that combines the functionality of R functions plyr::mapvalues, plyr::revalue and stringr::str_replace_all. Since DSM covered the case of single values, I will cover the multi-value case.

Example series

In [10]: x = pd.Series([1, 2, 3, 4])

In [11]: x
Out[11]: 
0    1
1    2
2    3
3    4
dtype: int64

We want to replace the positive integers with negative integers (and not by multiplying with -1).

Two lists of values

One way to do this by having one list (or pandas series) of the values we want to replace and a second list with the values we want to replace them with.

In [14]: x.replace([1, 2, 3, 4], [-1, -2, -3, -4])
Out[14]: 
0   -1
1   -2
2   -3
3   -4
dtype: int64

This corresponds to plyr::mapvalues.

Dictionary of value pairs

Sometimes it's more convenient to have a dictionary of value pairs. The index is the one we replace and the value is the one we replace it with.

In [15]: x.replace({1: -1, 2: -2, 3: -3, 4: -4})
Out[15]: 
0   -1
1   -2
2   -3
3   -4
dtype: int64

This corresponds to plyr::revalue.

Strings

It works similarly for strings, except that we also have the option of using regex patterns.

If we simply want to replace strings with other strings, it works exactly the same as before:

In [18]: s = pd.Series(["ape", "monkey", "seagull"])
In [22]: s
Out[22]: 
0        ape
1     monkey
2    seagull
dtype: object

Two lists

In [25]: s.replace(["ape", "monkey"], ["lion", "panda"])
Out[25]: 
0       lion
1      panda
2    seagull
dtype: object

Dictionary

In [26]: s.replace({"ape": "lion", "monkey": "panda"})
Out[26]: 
0       lion
1      panda
2    seagull
dtype: object

Regex

Replace all as with xs.

In [27]: s.replace("a", "x", regex=True)
Out[27]: 
0        xpe
1     monkey
2    sexgull
dtype: object

Replace all ls with xs.

In [28]: s.replace("l", "x", regex=True)
Out[28]: 
0        ape
1     monkey
2    seaguxx
dtype: object

Note that both ls in seagull were replaced.

Replace as with xs and ls with ps

In [29]: s.replace(["a", "l"], ["x", "p"], regex=True)
Out[29]: 
0        xpe
1     monkey
2    sexgupp
dtype: object

In the special case where one wants to replace multiple different values with the same value, one can just simply a single string as the replacement. It must not be inside a list. Replace as and ls with ps

In [29]: s.replace(["a", "l"], "p", regex=True)
Out[29]: 
0        ppe
1     monkey
2    sepgupp
dtype: object

(Credit to DaveL17 in the comments)

CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
  • 2
    +1 for a nice series of examples. For future visitors, you can also replace multiple values with a single value `s.replace(["a", "l"], "x", regex=True)` but the single replacement value cannot be in a list (the 'from' and 'to' lists must be of equal value in order to work.) – DaveL17 Jan 30 '17 at 14:34
  • I added your example. – CoderGuy123 Jan 31 '17 at 03:05
  • Cheers. I can no longer edit my comment above, but it would be better described as (the 'from' and 'to' lists must be of equal *length* in order to work.) – DaveL17 Jan 31 '17 at 12:41