how to extract head nouns from a phrase in python?

Question

I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.

I guess you're doing some natural language analysis, maybe take a look at [nltk](https://www.nltk.org/) — Plopp, Sep 20 '18 at 11:29
As a general advice; I suggest you determine sub-problems/tasks you want to solve. In this case, I don't know much about what exactly you want to do. I assume you want to find the first noun in a sentence. You can start by spliting the sentence into words using the regular expression module [1] and then check each whether it is a noun [2]. [1] https://stackoverflow.com/questions/4998629/python-split-string-with-multiple-delimiters [2] https://stackoverflow.com/questions/28033882/determining-whether-a-word-is-a-noun-or-not — Zapnuk, Sep 20 '18 at 11:39

score 1 · Answer 1 · answered Sep 20 '18 at 11:48

You can use Stanford Parser package in NLTK and get dependency relations; then use the relations work for you, such as nn or compound (noun compound modifier). You can take a look at De Marneffe's typed dependencies manual here.

In the manual, the noun phrase of "oil price futures" contains compounds having two modifiers and a head.

You can check any sentence's parse trees and dependencies from Stanford Parser demo interface here.

Hope this helps,

Cheers

Have no idea that this answer gets a downvote : ) can someone explain? — berkin, Sep 20 '18 at 12:12

score 0 · Answer 2 · edited Mar 18 '22 at 06:32

This task is known as Part-of-Speech tagging and falls within the field of Natural Language Processing (NLP). In order to extract nouns from a text you can either use nltk

import nltk

text= 'Your text goes here'

# Check if noun (=NN)
isNoun = lambda pos: pos[:2] == 'NN'

# tokenise text and keep only nouns
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if isNoun (pos)] 
print(nouns)

or TextBlob

from textblob import TextBlob
text= 'Your text goes here'
blob = TextBlob(text)
print(blob.noun_phrases)

If you want to learn more about PoS tagging, you may find this post from official's nltk page very useful.

score 0 · Answer 3 · answered Sep 20 '18 at 11:41

You can use Parts of speech tagging to sentence by using NLTK toolkit package and extract the tags associated with either "Nouns" , "Verbs" also

text = '''I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.'''
pos_tagged_sent = nltk.pos_tag(nltk.tokenize.word_tokenize(text))

nouns = [tag[0] for tag in pos_tagged_sent if tag[1]=='NN']

Out:

[('I', 'PRP'),
 ('am', 'VBP'),
 ('doing', 'VBG'),
 ('a', 'DT'),
 ('keyphrase', 'NN'),
 ('classification', 'NN'),

how to extract head nouns from a phrase in python?

3 Answers3