0

Given there is a path from two common synsets to get a lowest common hypernym, it seems reasonable there should be someway to walk back and find the hyponyms that lead to that hypernym

from nltk.corpus import wordnet as wn
alaska = wn.synset('Alaska.n.1')
california = wn.synset('California.n.1')
common_hypernym = alaska.lowest_common_hypernyms(california)[0]

common_hypernym
Synset('american_state.n.01')

common_hypernym.do_something_awesome()
['Alabama.n.1', 'Alaska.n.1', ...] #all 50 american states
Tom M
  • 1,292
  • 13
  • 18
  • Can you explain what you need in more detail? It's a little unclear what you're trying to achieve. – alvas May 09 '17 at 15:06
  • See the result of 'do_something_awesome() in this case it would be all American states, For say USA and Canada it might be all countries. For Atlantic and Pacicific it might be all oceans. – Tom M May 09 '17 at 16:13
  • Then I don't think `wordnet`, try word embeddings `word2vec` or something else. Wordnet is manually created, there are (embarrassingly) a lot of "holes" in the ontological knowledge. – alvas May 09 '17 at 16:14
  • word2vec sort of works but it is far too likely to give back unrelated categories (cities, countries and other forms of geography; descriptions related to states, etc.). I guess I could go through every pair and get their _shortest_hypernym_paths, and then group by path similarity – Tom M May 09 '17 at 16:23

2 Answers2

2

Use Synset1._shortest_path_distance(Synset2) to find the hypernyms and their distances:

>>> from nltk.corpus import wordnet as wn
>>> alaska = wn.synset('Alaska.n.1')
>>> california = wn.synset('California.n.1')

>>> alaska._shortest_hypernym_paths(california)
{Synset('district.n.01'): 4, Synset('location.n.01'): 6, Synset('region.n.03'): 5, Synset('physical_entity.n.01'): 8, Synset('entity.n.01'): 9, Synset('state.n.01'): 2, Synset('administrative_district.n.01'): 3, Synset('object.n.01'): 7, Synset('alaska.n.01'): 0, Synset('*ROOT*'): 10, Synset('american_state.n.01'): 1}

Now find the minimum path:

>>> paths = alaska._shortest_hypernym_paths(california)
>>> min(paths, key=paths.get)
Synset('alaska.n.01')

Now, this is boring because california and alaska are sister nodes on the WordNet hierarchy. Let's filter out all sisters nodes:

>>> paths = {k:v for k,v in paths.items() if v > 0}
>>> min(paths, key=paths.get)
Synset('american_state.n.01')

To get the children nodes of the american_state (I supposed this is the "something awesome" you need...):

>>> min(paths, key=paths.get).hyponyms()
[Synset('free_state.n.02'), Synset('slave_state.n.01')]
>>> list(min(paths, key=paths.get).closure(lambda s:s.hyponyms()))
[Synset('free_state.n.02'), Synset('slave_state.n.01')]

This might look shocking but actually, there's no hypernyms indicated for alaska or california:

>>> alaska.hypernyms()
[]
>>> california.hypernyms()
[]

And the connection made using the _shortest_hypernym_paths is by means of a dummy root, take a look at Is wordnet path similarity commutative?

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
  • Indeed i'd used the hyponyms() on american states, and it only returns free/slave and those had nothing - so I assumed there must be another way to get at it. – Tom M May 09 '17 at 16:16
1

Newer solution is:

alaska = wordnet.synset('Alaska.n.1')
california = wordnet.synset('California.n.1')
alaska.lowest_common_hypernyms(california)

[Synset('american_state.n.01')]

This old function is private and doesn't work this way, maybe other but anyways, you can also choose x.common.hypernyms(y) to find all common items.

Peter.k
  • 1,475
  • 23
  • 40