0

In my spark streaming application, I'm trying to stream the data from Azure EventHub and writing onto couple of directories in the hdfs blob based on the data. Basically followed the link multiple writeStream with spark streaming

Below is the code:

def writeStreamer(input: DataFrame, checkPointFolder: String, output: String): StreamingQuery = {
  input
    .writeStream
    .format("com.databricks.spark.avro")
    .partitionBy("year", "month", "day")
    .option("checkpointLocation", checkPointFolder)
    .option("path", output)
    .outputMode(OutputMode.Append)
    .start()
}

writeStreamer(dtcFinalDF, "/qmctdl/DTC_CheckPoint", "/qmctdl/DTC_DATA")

val query1 = writeStreamer(canFinalDF, "/qmctdl/CAN_CheckPoint", "/qmctdl/CAN_DATA")

query1.awaitTermination()

What i currently observe is that, data is writing successfully to "/qmctdl/CAN_DATA directory but no data is getting written to "/qmctdl/DTC_DATA. Am i doing anything wrong here, any help would be appreciated much.

chaitra k
  • 371
  • 1
  • 4
  • 18

2 Answers2

0

Take a look at this answer: Executing separate streaming queries in spark structured streaming

I don't know about Azure EventHub but basically I think one stream is reading all the data and other stream don't get served any data.

wypul
  • 807
  • 6
  • 9
0

Can you try this

spark.streams.awaitAnyTermination() 

Instead of

query1.awaittTermination()
Venkata
  • 317
  • 3
  • 13