So, I tried to test on Spark operations that cause shuffling based on this stackoverflow post: LINK. However, it doesn't make sense for me when the cartesian
operation doesn't cause shuffling in Spark since they need to move the partitions across the network in order to put them together locally.
How does Spark actually do its cartesian
and distinct
operations behind the scene??