0

I just can't seem to get this right.

I now have a Pandas series called text

It consists of 105 rows of article text.

I want to loop through each of these rows and replace certain characters like " and -. Here's my code

cleaned = []
for i in text:
    i.replace('“', '')
    i.replace('”', '')
    i.replace('–', '')
    cleaned.append(i)

However, when I try to print out the text in this cleaned list, the characters above aren't removed. Where am I going wrong? Thanks

for i in cleaned:
    print(i)
cget
  • 370
  • 1
  • 4
  • 21

2 Answers2

1

string.replace() returns the string with the replaced values. It doesn't modify the original so do something like this:

for i in text:
    i = i.replace('“', '')
    i = i.replace('”', '')
    i = i.replace('–', '')
    cleaned.append(i)
1

Use regular expressions to clean your text. The syntax can be a little confusing when you start, but it's much more powerful when you need to up your text cleaning.

import re

cleaned = []
for i in text:
    i = re.sub(r'\“', '', i)
    i = re.sub(r'\”', '', i
    i = re.sub(r'_', '', i)
    cleaned.append(i)

You can also replace all non letters and numbers using

i = re.sub(r'\W', '', i)

Remember that \ is for character escapes.

rnkv2
  • 23
  • 6