1

Currently I'm working with WordNet based semantic similarity measurement project. As I know below are the steps for computing semantic similarity between two sentences:

  1. Each sentence is partitioned into a list of tokens.
  2. Stemming words.
  3. Part-of-speech disambiguation (or tagging).
  4. Find the most appropriate sense for every word in a sentence (Word Sense Disambiguation).
  5. Compute the similarity of the sentences based on the similarity of the pairs of words.

Now I'm at step 3. But I couldn't get the correct output. I'm not very familiar with Python. So I would appreciate your help.

This is my code.

import nltk
from nltk.corpus import stopwords


def get_tokens():

    test_sentence = open("D:/test/resources/AnswerEvaluation/Sample.txt", "r")

    try:
        for item in test_sentence:
            stop_words = set(stopwords.words('english'))

            token_words = nltk.word_tokenize(item)

            sentence_tokenization = [word for word in token_words if word not in stop_words]
            print (sentence_tokenization)
            return sentence_tokenization

    except Exception as e:
        print (str(e))


def get_stems():

    tokenized_sentence = get_tokens()

    for tokens in tokenized_sentence:
        sentence_stemming = nltk.PorterStemmer().stem(tokens)
        print (sentence_stemming)
        return sentence_stemming


def get_tags():

    stemmed_sentence = get_stems()

    tag_words = nltk.pos_tag(stemmed_sentence)

    print (tag_words)
    return tag_words

get_tags()

Sample.txt contains the sentences, I was taking a ride in the car. I was riding in the car.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Hash
  • 95
  • 3
  • 11

0 Answers0