Currently I'm working with WordNet based semantic similarity measurement project. As I know below are the steps for computing semantic similarity between two sentences:
- Each sentence is partitioned into a list of tokens.
- Stemming words.
- Part-of-speech disambiguation (or tagging).
- Find the most appropriate sense for every word in a sentence (Word Sense Disambiguation).
- Compute the similarity of the sentences based on the similarity of the pairs of words.
Now I'm at step 3. But I couldn't get the correct output. I'm not very familiar with Python. So I would appreciate your help.
This is my code.
import nltk
from nltk.corpus import stopwords
def get_tokens():
test_sentence = open("D:/test/resources/AnswerEvaluation/Sample.txt", "r")
try:
for item in test_sentence:
stop_words = set(stopwords.words('english'))
token_words = nltk.word_tokenize(item)
sentence_tokenization = [word for word in token_words if word not in stop_words]
print (sentence_tokenization)
return sentence_tokenization
except Exception as e:
print (str(e))
def get_stems():
tokenized_sentence = get_tokens()
for tokens in tokenized_sentence:
sentence_stemming = nltk.PorterStemmer().stem(tokens)
print (sentence_stemming)
return sentence_stemming
def get_tags():
stemmed_sentence = get_stems()
tag_words = nltk.pos_tag(stemmed_sentence)
print (tag_words)
return tag_words
get_tags()
Sample.txt contains the sentences, I was taking a ride in the car. I was riding in the car.