How does spark readStream code get triggered in Databricks AutoLoader.
I understand it is event driven process and a new file notification causes the file to be consumed.
Should the below code be run as a job? If that's the case, how is notifications useful?
What happens when the below code is executed? What is the sequence of steps that takes place in the processing of files using notification mechanism in Databricks?
Once I run the below code, the command gets completed in 2 minutes.
df = spark.readStream.format("cloudFiles")\
.option("cloudFiles.useNotifications", True)\
.option("cloudFiles.format", "csv")\
.option("cloudFiles.connectionString", connection_string)\
.option("cloudFiles.resourceGroup", resource_group)\
.option("cloudFiles.subscriptionId", subscription_id)\
.option("cloudFiles.tenantId", tenant_id)\
.option("cloudFiles.clientId", client_id)\
.option("cloudFiles.clientSecret", secret)\
.option("cloudFiles.region", region)\
.option("header", true)\
.schema(dataset_schema)\
.option("cloudFiles.includeExistingFiles", True)\
.load(file_location)