0

I am saving an ML model to an S3 bucket. After a long search this thread helped me find a solution. My code looks as follows:

sc.parallelize(Seq(model), 1).saveAsObjectFile("s3a://bucket/nameModel.model")

The first time a run this job everything went fine. The second time I got

FileAlreadyExistsException: Output directory "s3a://bucket/nameModel.model" already exists`

I didn't find a solution to overwrite this model. So I first tried to delete the existing model before saving it:

val instanceProfileCredentialsProvider = new com.amazonaws.auth.InstanceProfileCredentialsProvider()
val amazonS3Client = new AmazonS3Client(instanceProfileCredentialsProvider)
amazonS3Client.deleteObject(new DeleteObjectRequest("bucket", "nameModel.model"))
sc.parallelize(Seq(model), 1).saveAsObjectFile("s3a://bucket/nameModel.model")

No succes, I still get the same exception. The new code doesn't seem to delete the existing model. Is there maybe another way to overwrite or delete the current ML model from the s3 bucket?

Community
  • 1
  • 1
RudyVerboven
  • 1,204
  • 1
  • 14
  • 31
  • in your code, you deleted "nameModel", while you are creating "nameModel.model". Is that only the case in the question or in your code too? – Ashish Awasthi Apr 28 '16 at 10:50
  • by the way, have you tried enabling versioning your s3 bucket http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html ? that way you will also not loose old models – Ashish Awasthi Apr 28 '16 at 10:52
  • I edited my question. That was a typo. In my code I'm actually trying to delete an object with the same name. – RudyVerboven Apr 28 '16 at 11:06

1 Answers1

0

How about using the org.apache.hadoop.fs.FileSystem. Something like,

def deleteS3Path {
    FileSystem.get(new URI("s3n://mybucket"), 
    sc.hadoopConfiguration).
    delete(new Path("s3n://mybucket/prefix/mykey"), true)
}

Will this work for you.

Pramit
  • 1,373
  • 1
  • 18
  • 27