It could be for a number of reasons, and to fully answer we would need to look at the code.
Even if it's taking time at saveAsTextFile()
, the operation causing it may be another. Mu hunch is that before the save operation, you are using reduceByKey
or a GROUP BY
.
Now, those operations can be problematics if you have skewed data, that is, data that is unbalanced, where most of the records belong to just a few keys. For instance, if you are grouping by US state, there are only 50, so you'd have only 50 tasks actually doing work, so even if you have 250 tasks in total, they won't have any input.
Or let's say you're grouping your users by country, but most of your users are from the US: you'd have one task processing most of the data and finishing much later than the others.
So, what you have to do is look at any operation that performs a grouping/reducing before the save, and look at the data to see if there's any skew.