Saving the result of a Spark Streaming application

Asked Mar 24 '15 at 14:38

Active Mar 24 '15 at 14:38

Viewed 131 times

I'm coding a Spark Streaming Application and I need to save the updated model, so I did as follows:

data.foreachRDD { rdd =>
  model = model.update(rdd) 
  rdd.context.parallelize(model.nodes).saveAsTextFile("target/model")
                         //model.nodes is an Array[Vector]
}

The problem is that I get this error (since it's a loop):

Output directory file "target/model" already exists

Could someone have an idea to solve this problem ?! thanks

asked Mar 24 '15 at 14:38

Momog

add an integer value that you can increment with each loop and append it to the file name? – SierraOscar Mar 24 '15 at 14:45
@S O - yes it is an option, but I only want to save the last model (with the idea of an integer value, it'll create a lot of files !!) – Momog Mar 24 '15 at 15:22
you could test the upper boundary of the model.nodes array and if the increment integer = that boundary then save? – SierraOscar Mar 24 '15 at 15:25
I couldn't because it is supposed to be an infinite stream ! – Momog Mar 24 '15 at 17:26
Possible duplicate of http://stackoverflow.com/questions/27033823/how-to-overwrite-the-output-directory-in-spark – May 16 '17 at 18:22

Saving the result of a Spark Streaming application

0 Answers0