0

I'm trying to remove some chars from strings that I don't want in python, but as far as I learned, replace function should work just fine but it isn't :(

Btw(this is just a simple wordcount function)

Code

fileName = "simple.txt"
inputFile = open(fileName, "rb")

wordCount = {}

for line in inputFile:
    splitted = line.split(" ")
    for word in splitted:
        word.replace('\n','') #It's not removing this chars from words
        word.replace('?','')  #Nor this ones

        if word in wordCount:
            wordCount[word] = wordCount[word] + 1
        else:
            wordCount[word] = 1

print wordCount

Input

How many roads must a man walk down Before you call him a man? How many seas must a white dove sail Before she sleeps in the sand? Yes, how many times must the cannon balls fly Before they're forever banned? The answer my friend is blowin' in the wind The answer is blowin' in the wind.

Yes, how many years can a mountain exist Before it's washed to the sea? Yes, how many years can some people exist Before they're allowed to be free? Yes, how many times can a man turn his head Pretending he just doesn't see? The answer my friend is blowin' in the wind The answer is blowin' in the wind.

Yes, how many times must a man look up Before he can really see the sky? Yes, how many ears must one man have Before he can hear people cry? Yes, how many deaths will it take till he knows That too many people have died? The answer my friend is blowin' in the wind The answer is blowin' in the wind.

Output

{'ears': 1, 'Yes,': 7, 'allowed': 1, 'knows\n': 1, 'sleeps': 1, 'people': 3, 'seas': 1, 'is': 6, '\n': 2, 'some': 1, 'it': 1, 'walk': 1, 'How': 2, 'see': 1, "blowin'": 6, 'have': 1, 'in': 7, 'roads': 1, 'up\n': 1, 'free?\n': 1, 'cry?\n': 1, 'really': 1, 'one': 1, 'mountain': 1, 'he': 4, 'just': 1, 'to': 2, "it's": 1, 'deaths': 1, 'washed': 1, 'head\n': 1, 'how': 7, 'down\n': 1, 'call': 1, 'take': 1, 'Pretending': 1, 'answer': 6, 'have\n': 1, 'white': 1, 'must': 5, "doesn't": 1, 'friend': 3, 'can': 5, 'be': 1, 'sail\n': 1, 'his': 1, 'wind\n': 3, 'sea?\n': 1, 'cannon': 1, 'till': 1, 'see?\n': 1, 'wind.\n': 3, 'man?\n': 1, 'you': 1, 'banned?\n': 1, 'hear': 1, 'too': 1, 'sky?\n': 1, 'The': 6, 'sand?\n': 1, 'dove': 1, 'him': 1, 'man': 4, 'a': 6, "they're": 2, 'forever': 1, 'balls': 1, 'look': 1, 'fly\n': 1, 'many': 10, 'exist\n': 2, 'times': 3, 'will': 1, 'turn': 1, 'died?\n': 1, 'she': 1, 'the': 10, 'years': 2, 'my': 3, 'That': 1, 'Before': 7}

Thank you!

David Dias
  • 1,792
  • 3
  • 16
  • 28

4 Answers4

5

.replace() returns the altered string. Store that return value:

word = word.replace('\n','') 

You could chain the replace calls:

word = word.replace('\n','').replace('?','')

Strings are immutable, they cannot be altered in-place.

Last but not least: use collections.Counter() to count words instead, it offers many additional features to make working with frequency counts easier:

from collections import Counter

with open(fileName, "rb") as inputFile:
    wordCount = Counter(w.replace('?', '')
        for line in inputFile for w in line.split())

which creates your wordCount structure with one line. Note that .split() will effectively strip extra whitespace and newlines for you.

Note that if you are removing punctuation from the start or end of words, you should really use the .strip() method instead:

wordCount = Counter(w.strip('.,:?')
    for line in inputFile for w in line.split())

where the .strip('.,:?') will remove any and all characters at the start or end that are listed in the argument.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

string.replace is not an in-place operation, it returns a value (the new string). Therefore, you need to do:

word = word.replace('\n', '')

One more thing:

string.split() without any arguments automatically splits on all whitespace, so if you remove the " ", you won't have to do .replace('\n', '') in the first place.

Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
1

Strings in python are immutable. This means that you don't modify the string with its methods, but rather the methods like replace return new string values that you then have to store in your variables.

More concretely, this means that given a string s:

s = 'Some string'

Then

s.replace('string','hello')

Simply returns the string 'Some hello', but that value is simply discarded and s is still 'Some string'. To modify s, you have to store the returned variable back in s explicitly like so:

s = s.replace('string','hello')

Now s is 'Some hello'.

James O'Doherty
  • 2,186
  • 13
  • 14
0

As I see it, a gentle approach is to write a function:

def remove_words (word, *to_replace):
for replace_word in to_replace:
    word = word.replace(replace_word, '')
return word
Jingkai He
  • 24
  • 1
  • 3