I read in several places that transformations that include a shuffling stage should be avoided when possible since shuffling involves sending data over the network between the nodes, which can have a high performance cost on a program.
I was looking for a list of Spark transformations that might cause shuffling on Spark's 2.4+ dataframes, and all I came up with is this this question regarding the old RDD API.