I have a question about spark writing the result after computation. I know that each executor writes its result back to HDFS/local-filesystem(based on the cluster manager used) after it completes working on its partitions.
This makes sense because waiting for all executors to complete and writing the result back is not really required if you don't need any aggregation of results.
But how does the write operation work when the data needs to be sorted on a particular column ( eg ID) in ascending or descending order?
Will spark's logical plan sort partitions first based on their ID at each executor before even computations begin? In that case, any executor could complete first and start writing its result to HDFS so how does the whole framework make sure that the final result is sorted?
Thanks in advance