For a project I would like to measure the amount of ‘human centered’ words within a text. I plan on doing this using WordNet. I have never used it and I am not quite sure how to approach this task. I want to use WordNet to count the amount of words that belong to certain synsets, for example the sysnets ‘human’ and ‘person’.
I came up with the following (simple) piece of code:
word = 'girlfriend'
word_synsets = wn.synsets(word)[0]
hypernyms = word_synsets.hypernym_paths()[0]
for element in hypernyms:
print element
Results in:
Synset('entity.n.01')
Synset('physical_entity.n.01')
Synset('causal_agent.n.01')
Synset('person.n.01')
Synset('friend.n.01')
Synset('girlfriend.n.01')
My first question is, how do I properly iterate over the hypernyms? In the code above it prints them just fine. However, when using an ‘if’ statement, for example:
count_humancenteredness = 0
for element in hypernyms:
if element == 'person':
print 'found person hypernym'
count_humancenteredness +=1
I get ‘AttributeError: 'str' object has no attribute '_name'’. What method can I use to iterate over the hypernyms of my word and perform an action (e.g. increase the count of human centerdness) when a word does indeed belong to the ‘person’ or ‘human’ synset.
Secondly, is this an efficient approach? I assume that iterating over several texts and iterating over the hypernyms of each noun will take quite some time.. Perhaps that there is another way to use WordNet to perform my task more efficiently.
Thanks for your help!