I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.
-
1I guess you're doing some natural language analysis, maybe take a look at [nltk](https://www.nltk.org/) – Plopp Sep 20 '18 at 11:29
-
As a general advice; I suggest you determine sub-problems/tasks you want to solve. In this case, I don't know much about what exactly you want to do. I assume you want to find the first noun in a sentence. You can start by spliting the sentence into words using the regular expression module [1] and then check each whether it is a noun [2]. [1] https://stackoverflow.com/questions/4998629/python-split-string-with-multiple-delimiters [2] https://stackoverflow.com/questions/28033882/determining-whether-a-word-is-a-noun-or-not – Zapnuk Sep 20 '18 at 11:39
3 Answers
You can use Stanford Parser package in NLTK and get dependency relations; then use the relations work for you, such as nn or compound (noun compound modifier). You can take a look at De Marneffe's typed dependencies manual here.
In the manual, the noun phrase of "oil price futures" contains compounds having two modifiers and a head.
You can check any sentence's parse trees and dependencies from Stanford Parser demo interface here.
Hope this helps,
Cheers

- 548
- 6
- 18
This task is known as Part-of-Speech tagging and falls within the field of Natural Language Processing (NLP). In order to extract nouns from a text you can either use nltk
import nltk
text= 'Your text goes here'
# Check if noun (=NN)
isNoun = lambda pos: pos[:2] == 'NN'
# tokenise text and keep only nouns
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if isNoun (pos)]
print(nouns)
or TextBlob
from textblob import TextBlob
text= 'Your text goes here'
blob = TextBlob(text)
print(blob.noun_phrases)
If you want to learn more about PoS tagging, you may find this post from official's nltk
page very useful.

- 8,112
- 9
- 41
- 63

- 36,235
- 20
- 134
- 156
You can use Parts of speech tagging to sentence by using NLTK toolkit package and extract the tags associated with either "Nouns" , "Verbs" also
text = '''I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.'''
pos_tagged_sent = nltk.pos_tag(nltk.tokenize.word_tokenize(text))
nouns = [tag[0] for tag in pos_tagged_sent if tag[1]=='NN']
Out:
[('I', 'PRP'),
('am', 'VBP'),
('doing', 'VBG'),
('a', 'DT'),
('keyphrase', 'NN'),
('classification', 'NN'),

- 4,528
- 1
- 17
- 31