1

I have already looked at escaping the characters but that didn't help me.

I have a pandas dataframe with a column called Page. this is a list of webpage names (not urls).

so currently they are written in 3 formats:

1. home ? home ? pagename1
2. home | home | pagename2
3. home home pagename3

I would like them all to be formatted like number 3.

I am trying to remove characters from the string objects in this column but leave the remainder of the code.

I have used this:

df.loc[df['Page'].str.replace(('\?|\|'), ''), Regex=True, Inplace=True]

but I get output:

File "<ipython-input-80-2c616b171200>", line 2
df['page']=df.loc[df['Page'].str.replace(('\?|\\'), ''), Regex=True, Inplace=True]
SyntaxError: invalid syntax

same output if I use this:

df['page']=df.loc[df['Page'].str.replace(('\?|\|'), ''), Regex=True, Inplace=True]

I've resorted to try other options such as:

x=pd.Series['Page']
x.str.replace('\?|\|','',regex = True, inplace=True)

but this gave me:

TypeError                                 Traceback (most recent call last) <ipython-input-70-6563d5fa5d40> in <module> 1 #clean up page names ----> 2 x=pd.Series['Page'] 3 x.str.replace('\?|\|','',regex = True, inplace=True) TypeError: 'type' object is not subscriptable

please can anyone help?

thank you

Mizz

wwnde
  • 26,119
  • 6
  • 18
  • 32
Mizz H
  • 67
  • 6
  • Does this answer your question? [How to replace text in a string column of a Pandas dataframe?](https://stackoverflow.com/questions/28986489/how-to-replace-text-in-a-string-column-of-a-pandas-dataframe) –  Jul 19 '22 at 10:27

2 Answers2

0

Data

df=pd.DataFrame({'text':['home ? home ? pagename1','home | home | pagename2','home home pagename3']})

                   text
0  home ? home ? pagename1
1  home | home | pagename2
2      home home pagename3

Solution

Use df.str.replace(regex, replacement) link

df.text=df.text.str.replace('[\s\?\s]|[\s\|\s]',' ')

               text
0  home   home   pagename1
1  home   home   pagename2
2      home home pagename3
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • so the \s is to tell it to look for 'whitespace ? whitespace'? and 'whitespace | whitespace'. I had to look it up so just adding it for any one following with similar question. thanks for your help – Mizz H Sep 27 '20 at 09:56
0

so you are getting a syntax error because the regex=True (yes, all lowercase) part should be part of the arguments for str.replace() in the parentheses. The below code

#Modifying page
df['page'] = df['page'].str.replace(('\?|\|'), '',regex=True)
print(df)

gets this result

0  home  home  pagename1
1  home  home  pagename2
2    home home pagename3
  • thank you Sophia. it works! I had spend hours trying to crack it. this worked. I had to remove the loc part of my code also. so final result is : df['Page']=df['Page'].str.replace(('\?|\|'), '', regex=True) df.head() – Mizz H Sep 27 '20 at 09:46
  • is it possible to use loc so that instead of changing it to a series it just updates the dataframe? I assume if it's a series then it doesn't amend the df and is just a subset to use? – Mizz H Sep 27 '20 at 09:57
  • No problem. This solution should amend the df itself. If I wanted to create a separate column without modifying the original, I would create another variable instead of using df['page']. – Sophia Song Sep 28 '20 at 23:18