What is the difference between forEachAsync vs forEachPartitionAsync?
If I were to guess here is I would say the following but please correct me if I am wrong.forEachAsync just iterate through values from all partitions one by one in an Async Manner
forEachPartitionAsync: Fan out each partition and run the lambda for each partition in parallel across different workers. The lambda here will Iterate through values from that partition one by one in Async manner
but wait, rdd operations should infact execute in parallel right? so if I call rdd.forEachAsync that should execute in parallel too isn't it? I guess I am a little confused what the difference really is now between forEachAsync vs forEachPartitionAsync? besides passing in Tuple vs Iterator of Tuples to the lambda respectively.