I am working with data where user info is string. I would like to assign unique integer values to those strings.
I was somewhat following this stack overflow post here. I am using the expression below to have an RDD of tuples:
user = data.map(lambda x:x[0]).distinct().zipWithUniqueId()
After that, I did
data = data.map(lambda x: Rating(int(user.lookup(x[0])), int(x[1]), float(x[2])))
What I ultimately want to do is run an ALS model on it, but so far I have been getting this error message
Exception: It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation.
I think the data type is somehow wrong, but I am not sure how to fix this.