When I use the python library gensim
and train a Word2Vec model, I can call the function like this word2vec_result.similarity('apple','banana')
to get the cosine similarity between apple and banana at local machine.
But in pyspark(version2.2)
, I can't find the same function in the document after the model built.
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from pyspark.mllib.feature import Word2Vec
from pyspark.mllib.feature import Word2VecModel
from pyspark import SparkConf, SparkContext
import logging
directory = "data_path"
inp = sc.textFile(directory).map(lambda row: row.split(" "))
model = word2vec_run(inp)
model.save(sc, "/data/word2vec_model")
Are there any simple ways to achieve the goal?