How to replace my customized characters with ' ' in Python?

Question

I am trying to replace my own customized characters with ' '. Here is what I feel confused about:

If I just replace one character, it is OK:

a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace('/',' ')
a

The result is:

   title title1
0    a/b    a b
1  a # b  a # b
2    a+b    a+b

If I use a short string which includes some characters, it is also OK:

b2='[?|:|-|\'|\\|/]'
a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace(b2,' ')
a

The result is:

   title title1
0    a/b    a b
1  a # b  a # b
2    a+b    a+b

But, when I try to use a long string to do this, nothing changes:

b1='[?|:|-|\'|\\|.|(|)|[|]|{|}|/]'
a=pd.DataFrame({'title':['a/b','a # b','a+b']})
a.loc[:,'title1']=a.loc[:,'title'].astype(str).str.replace(b1,' ')
a

The result is:

   title title1
0    a/b    a/b
1  a # b  a # b
2    a+b    a+b

You can see that in the first two examples, / is replaced with ' '. But in the last one, the replacement does not happen, which I do not know why? Is this because there is a limit for the string? Or, there is a better way that I do not know?

Update

Thanks a lot @Oliver Hao. But what I what is to do this for one (or more) column in a data frame, then save the result back to the data frame as a new column. So when I try:

regex = r"[?:\-'\\\|.()\[\]{}/]"
a.loc[:,'title1']=re.sub(regex," ",a.loc[:,'title'],0,re.MULTILINE)

I have got the error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\fefechen\AppData\Local\Programs\Python\Python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

I have not used python. You can see if it is a python version, because the test code I gave is 2.x, and you are using 3.x. — Oliver Hao, Aug 07 '19 at 05:50
[Look at this .](https://stackoverflow.com/questions/11475885/python-replace-regex) — Oliver Hao, Aug 07 '19 at 05:54

score 1 · Accepted Answer · answered Sep 01 '19 at 01:24

1

This expression might also work,

b1="[|,.:;+–_#&@!$%()[\]{}?'\"\/\\-]"

with less escapings.

answered Sep 01 '19 at 01:24

Emma

27,428
11
44
69

score 0 · Answer 2 · answered Aug 07 '19 at 01:35

Updated to:b1='[?:\-\'\\\|.()\[\]{}/]'

regex demo

Code:

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"[?:\-'\\\|.()\[\]{}/]"

test_str = "'a/b','a # b','a+b'"

subst = " "

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Hi, Thanks a lot. But I need to save the result back to the data frame as a new column. So it it different from your answer. And I do not know how to revise it. Could you please take a look at my edited question above? Thanks — Feng Chen, Aug 07 '19 at 01:52

score 0 · Answer 3 · answered Aug 07 '19 at 03:03

0

I found the answers myself. The last one does not work because I should do this:

b1="[?|:|\-|\–|\'|\\|.|\(|\)|\[|\]|\{|\}|/|#|+|,|;|_|\"|&|@|!|$|%|\|]"

put \ in front of some special characters.

answered Aug 07 '19 at 03:03

Feng Chen

2,139
4
33
62

The pipe between characters in a character class is useless. – Toto Sep 01 '19 at 10:13

How to replace my customized characters with ' ' in Python?

Update

3 Answers3