2

I have a code in apache spark 1.6.3 running on qubole which writes data to multiple tables(parquet format) on s3. At the time of writing to tables I keep getting java.io.FileNotFound exception.

I am even setting: spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter.
But this does not seem to solve my problem. Also while checking the logs I see that the exception is due to the _temporary location being missing. I don't understand why the _temporary location even after using DirectParquetOutputCommitter. This exception keeps occurring.

Please let me know if anyone know something to solve this in qubole. Thanks.

ManmeetP
  • 801
  • 7
  • 17

1 Answers1

1

S3 is not a consistent file system; it is an eventually consistent object store, whose listing operations tend to briefly lag the files which have been created.

Any code which assumes that written data is observably "there" when you look can break in this world. Sorry

stevel
  • 12,567
  • 1
  • 39
  • 50