ALS in mllib vs ALS in ml ---- spark

Question

I want to use ALS(Alternating Least Squares matrix factorization) to get some prediction from training data. In my previous understanding, ALS in mllib and ml package do the same job, which means when the training data and test data is same, both method would have same output.

However, maybe I am worry. Look following code:

from pyspark import SparkContext

sparkC = SparkContext()
sqlC = SQLContext(sparkC)
trainData = sparkC.textFile("Data/trainData.txt").map(lambda line:line.split("\t"))
testData = sparkC.textFile("Data/testData.txt").map(lambda line: line.split("\t"))
print(testData.count())  # output1

#---------when use ml package----------------
from pyspark.ml.recommendation import ALS

als = ALS(rank = 10,maxIter = 20)
model = als.fit(trainDataFrame)
predTestData = model.transform(testDataFrame)
print(predTestData.count())  ####  output2
#----------------------------------------------

#---------when use mllib package----------------
from pyspark.mllib.recommendation import ALS
model = ALS.train(trainData, 10, seed=3, iterations=20)

predTestData = model.predictAll(testData).\
map(lambda r: (r.user, r.product, r.rating))
print(predTestData.count())   ####   output3

In above code, the training data and test data are same when I use ml and mllib respectively. However the output is different. Moreover, the number of predictions should be equal to the number of test data. However, in my case, input1 = input2, that is well, but input3 < input1, which means some predictions disappear!!

what cause this? Or ALS in ml is different with ALS in 'mllib`?

There's not really enough information to go off of here, but I suspect what is happening is that you have users that are in the test set but not in the train set. ML transform method produces NaNs for these users, but MLlib predict will simply exclude them. If you could post a sample of your data, and the counts that you get, it would be more helpful. — Seth Hendrickson, Apr 27 '16 at 21:52
@SethHendrickson you are right. There are some users that are in the test set but not in the train set. Thanks a lot. — sydridgm, Apr 27 '16 at 22:57
@SethHendrickson actually, there are some items that in the test set but not in the train set — sydridgm, Apr 27 '16 at 23:18

ALS in mllib vs ALS in ml ---- spark

0 Answers0