0

I'm coding a Spark Streaming Application and I need to save the updated model, so I did as follows:

data.foreachRDD { rdd =>
  model = model.update(rdd) 
  rdd.context.parallelize(model.nodes).saveAsTextFile("target/model")
                         //model.nodes is an Array[Vector]
}

The problem is that I get this error (since it's a loop):

Output directory file "target/model" already exists

Could someone have an idea to solve this problem ?! thanks

Momog
  • 567
  • 7
  • 27
  • add an integer value that you can increment with each loop and append it to the file name? – SierraOscar Mar 24 '15 at 14:45
  • @S O - yes it is an option, but I only want to save the last model (with the idea of an integer value, it'll create a lot of files !!) – Momog Mar 24 '15 at 15:22
  • you could test the upper boundary of the model.nodes array and if the increment integer = that boundary then save? – SierraOscar Mar 24 '15 at 15:25
  • I couldn't because it is supposed to be an infinite stream ! – Momog Mar 24 '15 at 17:26
  • Possible duplicate of http://stackoverflow.com/questions/27033823/how-to-overwrite-the-output-directory-in-spark –  May 16 '17 at 18:22

0 Answers0