1

I want to read a dataset from an S3 directory, make some updates and overwrite it to the same file. What I do is:

  dataSetWriter.writeDf(
    finalDataFrame,
    destinationPath,
    destinationFormat,
    SaveMode.Overwrite,
    destinationCompression)  

However My job fails showing an errorwith this message:

     java.io.FileNotFoundException: No such file or directory 's3://processed/fullTableUpdated.parquet/part-00503-2b642173-540d-4c7a-a29a-7d0ae598ea4a-c000.parquet'
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.  

Why is this happening? Is there anything that I am missing with the "overwrite" mode?
thanks

Vladislav Varslavans
  • 2,775
  • 4
  • 18
  • 33
3nomis
  • 1,175
  • 1
  • 9
  • 30
  • [This question](https://stackoverflow.com/questions/42920748/spark-sql-savemode-overwrite-getting-java-io-filenotfoundexception-and-requirin) may help. It looks like it's the same problem. – Hellen Nov 25 '19 at 21:55

0 Answers0