I have built a Word2Vec model using Spark and save it as a model. Now, I want to use it in another code as offline model. I have loaded the model and used it to present vector of a word (e.g. Hello) and it works well. But, I need to call it for many words in an RDD using map.
When I call model.transform() in a map function, it throws this error:
"It appears that you are attempting to reference SparkContext from a broadcast " Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
the code:
from pyspark import SparkContext
from pyspark.mllib.feature import Word2Vec
from pyspark.mllib.feature import Word2VecModel
sc = SparkContext('local[4]',appName='Word2Vec')
model=Word2VecModel.load(sc, "word2vecModel")
x= model.transform("Hello")
print(x[0]) # it works fine and returns [0.234, 0.800,....]
y=sc.parallelize([['Hello'],['test']])
y.map(lambda w: model.transform(w[0])).collect() #it throws the error
I will really appreciate your help.