2

Using NLP From a given sentence i am able to extract all adjectives and nouns easily using Core NLP. But what im struggling to do is actually extract phrases out of the sentence.

For example i have the following sentences:

  1. This person is trust worthy.
  2. This person is non judgemental.
  3. This person is well spoken.

For all these sentences using NLP i want to extract the phrases trust worthy, non judgemental, well spoken and so forth. I wanna extract all these related words.

How do i do this?

Thanks,

Sidhant
  • 421
  • 1
  • 6
  • 17

3 Answers3

2

I think that you first need to think past these specific examples, and think about the structure of exactly what you want to extract. For example, in your cases you can use some simple heuristics to find any instance of a copular child, and all of its modifiers.

If the scope of what you need to extract is larger than that, you can go back to the drawing boards and rethink some rules based on basic linguistic features that are available in e.g. Stanford CoreNLP, or as another poster has linked, spaCy.

Finally, if you need the ability to generalize to other unknown examples, you may want to train a classifier (maybe start with a simple logistic regression classifier), by feeding it relevant linguistic features and tagging each token in a sentence as relevant or not relevant.

adamits
  • 73
  • 1
  • 7
1

For your specific use case Open Information Extraction seems to be a suitable solution. It extracts triples containing a subject, a relation and an object. Your relation seems to be always be (infinitive of is) and your subject seems to be always person, so we are only interested in the object.

import edu.stanford.nlp.ie.util.RelationTriple;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.naturalli.NaturalLogicAnnotations;
import edu.stanford.nlp.util.CoreMap;
import java.util.Collection;
import java.util.Properties;

public class OpenIE {

    public static void main(String[] args) {
        // Create the Stanford CoreNLP pipeline
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        // Annotate your sentences
        Annotation doc = new Annotation("This person is trust worthy. This person is non judgemental. This person is well spoken.");
        pipeline.annotate(doc);

        // Loop over sentences in the document
        for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
          // Get the OpenIE triples for the sentence
          Collection<RelationTriple> triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);
          // Print the triples
          for (RelationTriple triple : triples) {
              triple.object.forEach(object -> System.out.print(object.get(TextAnnotation.class) + " "));
              System.out.println();
          }
       }
    }
}

The output would be the following:

trust 
worthy 
non judgemental 
judgemental 
well spoken 
spoken 

The OpenIE algorithm possibly extracts multiple triples per sentence. For your use case the solution might be to just take the triple with the largest number of words in the object.

Another thing to mention is that the object of your first sentence is not extracted "correctly", at least not in the way you want it. This is happening because trust is a noun and worthy is an adjective. The easiest solution would be to write it with a hyphen (trust-worthy). Another possible solution is to check the Part of Speech Tags and perform some additional steps when you encounter a noun followed by an adjective.

Tobias Geiselmann
  • 2,139
  • 2
  • 23
  • 36
1

To check similarity between similar phrases, you could use word embeddings such as GLOVE. Some NLP libraries come with the embeddings, such as Spacy. https://spacy.io/usage/vectors-similarity

Note: Spacy uses cosine similarity on both a token level and a phrase level, and Spacy also offers a convenience similarity function for larger phrases/sentences.

For example: (using spacy & python)

doc1 = nlp(u"The person is trustworthy.")
doc2 = nlp(u"The person is non judgemental.")
cosine_similarity = doc1.similarity(doc2)

And cosine_similarity could be used to show how similar two phrases/words/sentences are, ranging from 0 to 1, where 1 is very similar.