Unfortunately, there's no direct usage of word similarity functions in NLTK, although there are support for synset similarities through the WordNet API in NLTK.
Though not exhaustive, here's a list of pre-trained word embeddings that can be used to find out cosine similarity of word vectors: https://github.com/alvations/vegetables
To use, here's an example of using the HLBL Embeddings (from Turian et al. 2011) https://www.kaggle.com/alvations/vegetables-hlbl-embeddings (scroll down to the data explorer and download the directory directly, the top download button on the dataset page seem to lead to some corrupted data).
After downloading, you can load the embeddings using numpy
:
>>> import pickle
>>> import numpy as np
>>> embeddings = np.load('hlbl.rcv1.original.50d.npy')
>>> tokens = [line.strip() for line in open('hlbl.rcv1.original.50d.txt')]
>>> embeddings[tokens.index('hello')]
array([-0.21167406, -0.04189226, 0.22745571, -0.09330438, 0.13239339,
0.25136262, -0.01908735, -0.02557277, 0.0029353 , -0.06194451,
-0.22384156, 0.04584747, 0.03227248, -0.13708033, 0.17901117,
-0.01664691, 0.09400477, 0.06688628, -0.09019949, -0.06918809,
0.08437972, -0.01485273, -0.12062263, 0.05024147, -0.00416972,
0.04466985, -0.05316647, 0.00998635, -0.03696947, 0.10502578,
-0.00190554, 0.03435732, -0.05715087, -0.06777468, -0.11803425,
0.17845355, 0.18688948, -0.07509124, -0.16089943, 0.0396672 ,
-0.05162677, -0.12486628, -0.03870481, 0.0928738 , 0.06197058,
-0.14603543, 0.04026282, 0.14052328, 0.1085517 , -0.15121481])
To compute similarity of two numpy array, you can try Cosine Similarity between 2 Number Lists
import numpy as np
cos_similarity = lambda a, b: np.dot(a, b)/(np.linalg.norm(a)*np.linalg.norm(b))
x, y = np.array([1,2,3]), np.array([2,2,1])
cos_similarity(x,y)