0

I am new to Apache Spark. I ran the sample ALS algorithm code present in the examples folder. I gave a csv file as an input. When I use model.save(path) to save the model, it is stored in gz.parquet file.

When I tried to open this file, I get these errors

Now I want to store the recommendation model generated in a text or csv file for using it outside Spark.

I tried the following function to store the model generated in a file but it was useless:

model.saveAsTextFile("path")

Please suggest me a way to overcome this issue.

Shishir Anshuman
  • 1,115
  • 7
  • 23

2 Answers2

1

Lest say you have trained your model with something like this:

val model = ALS.train(ratings, rank, numIterations, 0.01)

All that you have to do is:

import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
import org.apache.spark.mllib.recommendation.Rating    
// Save
model.save(sc, "yourpath/yourmodel")
// Load Model
val sameModel = MatrixFactorizationModel.load(sc, "yourpath/yourmodel")
JoseM LM
  • 373
  • 1
  • 8
  • The above snippet is already added in the code. I want to save the recommendation model in a text file. I want to use this recommendation text file in an another application (outside Spark). – Shishir Anshuman Mar 15 '16 at 16:16
  • The above code snippet saves the model in a folders containing gz.parquet files. – Shishir Anshuman Mar 15 '16 at 17:32
  • You can iterate over all your training data extracting a prediction for each point of your data and write it in a text file. Then you will need a method to load that predictions in your target model (scikit or whatever). – JoseM LM Mar 17 '16 at 07:29
  • After performing prediction on the training data, when i try to store the data in text file, the text file only contains the path of the directory. The model produced after prediction is in the RDD format. I want to store this RDD in textual format in a text file – Shishir Anshuman Mar 17 '16 at 15:47
  • If the problem is to write the RDD into a TextFile, you can do it like this: http://stackoverflow.com/questions/31666361/process-spark-streaming-rdd-and-store-to-single-hdfs-file/31669187#31669187 – JoseM LM Mar 18 '16 at 10:44
0

As it turns out saveAsTextFile() only works on the slaves.Use collect() to collect the data from the slaves so it can be saved locally on the master. Solution can be found here

Community
  • 1
  • 1
Shishir Anshuman
  • 1,115
  • 7
  • 23