11

I have the following pandas dataframe. Say it has two columns: id and search_term:

id       search_term
37651    inline switch

I do:

train['search_term'] = train['search_term'].str.replace("in."," in. ")

expecting that the dataset above is unaffected, but I get in return for this dataset:

id       search_term
37651    in.  in.  switch

which means inl is replaced by in. and ine is replaced by in., as if I where using a regular expression, where dot means any character.

How do I rewrite the first command so that, literally, in. is replaced by in. but any in not followed by a dot is untouched, as in:

a = 'inline switch'
a = a.replace('in.','in. ')

a
>>> 'inline switch'
smci
  • 32,567
  • 20
  • 113
  • 146
Alejandro Simkievich
  • 3,512
  • 4
  • 33
  • 49

3 Answers3

5

The version 0.23 or newer, the str.replace() got a new option for switching regex. Following will simply turn it off.

df.search_term.str.replace('in.', 'in. ', regex=False)

Will results in:

0    inline switch
1         in. here
Name: search_term, dtype: object
daisukelab
  • 91
  • 2
  • 6
2

and here is the answer: regular expression to match a dot.

str.replace() in pandas indeed uses regex, so that:

df['a'] = df['a'].str.replace('in.', ' in. ')

is not comparable to:

a.replace('in.', ' in. ')

the latter does not use regex. So use '\.' instead of '.' in a statement that uses regex if you really mean dot and not any character.

Regular Expression to match a dot

Community
  • 1
  • 1
Alejandro Simkievich
  • 3,512
  • 4
  • 33
  • 49
1

Try escaping the .:

import pandas as pd

df = pd.DataFrame({'search_term': ['inline switch', 'in.here']})
>>> df.search_term.str.replace('in\\.', 'in. ')
0    inline switch
1          in. here
Name: search_term, dtype: object
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • thanks Ami. I see you escaped the . in the first argument, but what about the second? if you want to literally replace 'in.' by 'in. ' should you then use str.replace('in\\.', 'in\\. ') or str.replace('in\\.', 'in. ')? – Alejandro Simkievich Mar 29 '16 at 23:41
  • @AlejandroSimkievich It would seem logical, but no. See the updated example above. Only the dot in the first string is interpreted as a regex character (which must be escaped). – Ami Tavory Mar 29 '16 at 23:43