We have a Databricks job that has suddenly started to consistently fail. Sometimes it runs for an hour, other times it fails after a few minutes.
The inner exception is
ERROR MicroBatchExecution: Query [id = xyz, runId = abc] terminated with error
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Could not verify copy source.
The job targets a notebook which consumes from event-hub with PySpark structured streaming, calculates some values based on the data, and streams data back to another event-hub topic.
The cluster is a pool with 2 workers and 1 driver running on standard Databricks 9.1 ML.
We've tried to restart job many times, also with clean input data and checkpoint location. We struggle to determine what is causing this error. We cannot see any 403 Forbidden errors in logs, which is sometimes mentioned on forums as a reason . Any assistance is greatly appreciated.