We want to listen to an Azure Event Hub and write the data to a Delta table in Azure Databricks. We have create
df = spark.readStream.format("eventhubs").options(**ehConf).load()
# Code omitted where message content is expanded into columns in the dataframe
df.writeStream \
.format("delta") \
.outputMode("append") \
.option("checkpointLocation", "/tmp/delta/events/_checkpoints/") \
.toTable("mydb.mytable")
This code works perfectly and the notebook stays on the dr.writeStream row until the job is canceled.
How should I set this up so it will run "eternally" and maybe even restart if the code crashes? Should I run it as a normal workflow and e.g. set it to run every minute but set Max concurrent runs to 1?