1

I am trying to replace my own customized characters with ' '. Here is what I feel confused about:

If I just replace one character, it is OK:

a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace('/',' ')
a

The result is:

   title title1
0    a/b    a b
1  a # b  a # b
2    a+b    a+b

If I use a short string which includes some characters, it is also OK:

b2='[?|:|-|\'|\\|/]'
a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace(b2,' ')
a

The result is:

   title title1
0    a/b    a b
1  a # b  a # b
2    a+b    a+b

But, when I try to use a long string to do this, nothing changes:

b1='[?|:|-|\'|\\|.|(|)|[|]|{|}|/]'
a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace(b1,' ')
a

The result is:

   title title1
0    a/b    a/b
1  a # b  a # b
2    a+b    a+b

You can see that in the first two examples, / is replaced with ' '. But in the last one, the replacement does not happen, which I do not know why? Is this because there is a limit for the string? Or, there is a better way that I do not know?

Update

Thanks a lot @Oliver Hao. But what I what is to do this for one (or more) column in a data frame, then save the result back to the data frame as a new column. So when I try:

regex = r"[?:\-'\\\|.()\[\]{}/]"
a.loc[:,'title1']=re.sub(regex," ",a.loc[:,'title'],0,re.MULTILINE)

I have got the error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\fefechen\AppData\Local\Programs\Python\Python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
halfer
  • 19,824
  • 17
  • 99
  • 186
Feng Chen
  • 2,139
  • 4
  • 33
  • 62

3 Answers3

1

This expression might also work,

b1="[|,.:;+–_#&@!$%()[\]{}?'\"\/\\-]"

with less escapings.

Emma
  • 27,428
  • 11
  • 44
  • 69
0

Updated to:b1='[?:\-\'\\\|.()\[\]{}/]'

regex demo

Code:

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"[?:\-'\\\|.()\[\]{}/]"

test_str = "'a/b','a # b','a+b'"

subst = " "

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Oliver Hao
  • 715
  • 3
  • 5
  • Hi, Thanks a lot. But I need to save the result back to the data frame as a new column. So it it different from your answer. And I do not know how to revise it. Could you please take a look at my edited question above? Thanks – Feng Chen Aug 07 '19 at 01:52
0

I found the answers myself. The last one does not work because I should do this:

b1="[?|:|\-|\–|\'|\\|.|\(|\)|\[|\]|\{|\}|/|#|+|,|;|_|\"|&|@|!|$|%|\|]"

put \ in front of some special characters.

Feng Chen
  • 2,139
  • 4
  • 33
  • 62