I would like to group keys in a dictionary based on their respective similarity. I want to look for similarity within different keys, and if they are similar enough, group them. Probably by using some sort of similarity score. I am thus specifically not interested in how they values within those dictionary match up (in the example below I kept them the same). I have been looking at similarity scores using sklearn cosine_similarity, but I could not find a way to apply this to keys in a dictionary. Anyone any clues on this?
I made a test dictionary to show what I mean. Some keys are very similar, and I would like to group those. How to group those is beyond the point now, but let's say I would like to add the numbers up.
As always, many thanks!
from sklearn.metrics.pairwise import cosine_similarity
dictionary = {'United States': {'population': 350, 'Continent': 'North America'},
'united states': {'population': 350, 'Continent': 'North America'},
'the United States of America': {'population': 350, 'Continent': 'North America'},
'USA': {'population': 350, 'Continent': 'North America'},
'Netherlands': {'population': 17, 'Continent': 'Europe'},
'the Netherlands': {'population': 17, 'Continent': 'Europe'},
'Japan': {'population': 160, 'Continent': 'Japan'}
}