I have a data frame as below.
ID Word Synonyms
------------------------
1 drove drive
2 office downtown
3 everyday daily
4 day daily
5 work downtown
I'm reading a sentence and would like to replace words in that sentence with their synonyms as defined above. Here is my code:
import nltk
import pandas as pd
import string
sdf = pd.read_excel('C:\synonyms.xlsx')
sd = sdf.apply(lambda x: x.astype(str).str.lower())
words = 'i drove to office everyday in my car'
#######
def tokenize(text):
text = ''.join([ch for ch in text if ch not in string.punctuation])
tokens = nltk.word_tokenize(text)
synonym = synonyms(tokens)
return synonym
def synonyms(words):
for word in words:
if(sd[sd['Word'] == word].index.tolist()):
idx = sd[sd['Word'] == word].index.tolist()
word = sd.loc[idx]['Synonyms'].item()
else:
word
return word
print(tokenize(words))
The code above tokenizes the input sentence. I would like to achieve the following output:
In: i drove to office everyday in my car
Out: i drive to downtown daily in my car
But the output I get is
Out: car
If I skip the synonyms
function, then my output has no issues and is split into individual words. I am trying to understand what I'm doing wrong in the synonyms
function. Also, please advise if there is a better solution to this problem.