I want to read a dataset from an S3 directory, make some updates and overwrite it to the same file. What I do is:
dataSetWriter.writeDf(
finalDataFrame,
destinationPath,
destinationFormat,
SaveMode.Overwrite,
destinationCompression)
However My job fails showing an errorwith this message:
java.io.FileNotFoundException: No such file or directory 's3://processed/fullTableUpdated.parquet/part-00503-2b642173-540d-4c7a-a29a-7d0ae598ea4a-c000.parquet'
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
Why is this happening? Is there anything that I am missing with the "overwrite" mode?
thanks