2

I have created a dictionary that I use to use to bring a variety of words to its base form.

dictionary = {'sunny': 'sun', 'banking': 'bank'}

def stemmingWords(sentence, dictionary):
    for word in sentence.split():
        temp = []
        if word in dictionary:
            word = dictionary[word]
            temp.append(word)
    sentence = ' '.join(temp)
    return(sentence)

Now if print the separate words it seems to work. However when I insert a whole sentence and I would like an updated version of this sentence something seems to go wrong. For example if I do:

sentence = "the sun us shining"
new_sentence = stemmingWords(sentence, dictionary)
print(new_sentence)

Gives me "shining". While I am looking "the sunny in shining".

Any thoughts on what goes wrong here?

Jithin Pavithran
  • 1,250
  • 2
  • 16
  • 41
user181796
  • 185
  • 7
  • 22

2 Answers2

5

First, your dictionary is the wrong way round, reverse it

dictionary = {'sunny': 'sun', 'banking': 'bank'}

a simple way to do it to avoid retyping it would be:

dictionary = {v:k for k,v in dictionary.items()}

note that if several words match a same word, reverting the dictionary won't work you have to solve the ambiguity first: so manually:

dictionary = {'sun', 'sunny': , 'sunn' : 'sunny', 'bank': 'banking'}

Then split and rebuild the string using a list comprehension and a get access returning the original value if not in the dictionary

def stemmingWords(sentence,dictionary):
    return " ".join([dictionary.get(w,w) for w in sentence.split()])

print(stemmingWords("the sun is shining",dictionary))

result:

the sunny is shining

note the deliberate ([]) when using join. It's faster to pass explicitly the list comprehension than the generator in that case.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
0

The issues with your functions are the followig:

  1. Dictionary key-value are swapped.
    Workaround: Try https://stackoverflow.com/a/1031878/4954434
  2. The append() function by default will append to end of the list.
    Thus position is not taken care properly.
  3. temp does not have all the words.

The following function should work.

def stemmingWords(sentence, dictionary):
    dictionary = dict((v,k) for k,v in dictionary.iteritems())

    splitted = sentence.split()
    for i in range(len(splitted)):
        if splitted[i] in dictionary:
            print splitted[i]
            splitted[i] = dictionary[splitted[i]]
    sentence = ' '.join(splitted)
    return(sentence)

While I hope this answer will help newbies, Jean-François Fabre's answer is far better.

Community
  • 1
  • 1
Jithin Pavithran
  • 1,250
  • 2
  • 16
  • 41