I have been using S3 for checkpointing with Structured Streaming. However I am getting the FileNotFound Exception related to eventual consistency in S3.
Below is what I currently have with S3 checkpointing.
val msg = testMsgs.writeStream.option("checkpointLocation",
s3://<bucket-name>/checkpoint123).foreach(writer).start
I am planning to switch to EMRFS as my spark job run in EMR.
How reliable is EMRFS and how do I use EMRFS for checkpointing?
Will there be a change in the way we implement checkpoint?
How do I enable EMRFS in EMR?