I know the difference between map
and mapPartitions
which target elements and iterators of elements respectively.
When should I use which? If the overhead is similar, why would I ever use mapPartitions
, since map
is easier to write?
I know the difference between map
and mapPartitions
which target elements and iterators of elements respectively.
When should I use which? If the overhead is similar, why would I ever use mapPartitions
, since map
is easier to write?
RDD.map
maps a function to each element of an RDD, whereas RDD.mapPartitions
maps a function to each partition of an RDD.
map
will not change the number of elements in an RDD, while mapPartitions
might very well do so.
See also this answer and comments on a similar question.