I'm trying to figure out if a single task is ever run using all available cores on the executor? Ie, if a stage contains only one task, does that mean the task is a single threaded single core processing on the executor or could the task use all available cores in a multithreaded fashion "under the covers"?
I'm running ETL jobs in Azure Databricks on one worker (hence one executor) and at one point in the pipeline a single job creates a single stage that runs a single task to process the entire dataset. The task takes a few minutes to complete.
I want to understand if a single task can use all available executor cores running functions in parallell or not? In this case I deserialize json messages with the from_json function and save them as parquet files. I'm worried this is a single threaded process going on in the single task.
spark
.read
.table("input")
.withColumn("Payload", from_json($"Payload", schema))
.write
.mode(SaveMode.Append)
.saveAsTable("output")