1

When I use the python library gensim and train a Word2Vec model, I can call the function like this word2vec_result.similarity('apple','banana') to get the cosine similarity between apple and banana at local machine.
But in pyspark(version2.2), I can't find the same function in the document after the model built.

Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from pyspark.mllib.feature import Word2Vec
from pyspark.mllib.feature import Word2VecModel
from pyspark import SparkConf, SparkContext
import logging
directory = "data_path"  
inp = sc.textFile(directory).map(lambda row: row.split(" "))
model = word2vec_run(inp)
model.save(sc, "/data/word2vec_model")

Are there any simple ways to achieve the goal?

chilun
  • 292
  • 6
  • 19
  • I think there is no such function. If you want the similarity value to get the similar words, you can use findSynonyms() method. otherwise ,believe we need to use workaround,check this https://stackoverflow.com/questions/43921636/apache-spark-python-cosine-similarity-over-dataframes – Suresh Aug 15 '17 at 08:14
  • @Suresh , thank you very much for reply. I thank i have to find other way to solve this problem. – chilun Aug 17 '17 at 09:09

0 Answers0