In the sort % spill process, which key is the start of a partition and anther?
Asked
Active
Viewed 114 times
0
-
Is this Spark Core or SQL? Can you show the exact code snippet you think about to make sure we're talking about the same things? – Jacek Laskowski May 30 '17 at 06:48
-
Possible duplicate of [How does HashPartitioner work?](https://stackoverflow.com/questions/31424396/how-does-hashpartitioner-work) – zero323 May 30 '17 at 07:47
-
It is Spark Core. As the [figure](https://0x0fff.com/wp-content/uploads/2015/08/spark_hash_shuffle_with_consolidation.png) shown. Thanks. @Jacek Laskowski – CCong May 30 '17 at 12:14
1 Answers
0
Regardless of whether it's Spark Core (with RDDs) or Spark SQL (with Datasets), the default partitioner is HashPartitioner where the hash of a key gives the partition:
A org.apache.spark.Partitioner that implements hash-based partitioning using Java's Object.hashCode.

Jacek Laskowski
- 72,696
- 27
- 242
- 420
-
-
Let's have this conversation after I got all the needed info from the OP :) If it's Spark SQL that answer might get tricker (and won't be a duplicate). – Jacek Laskowski May 30 '17 at 07:47
-
Well, details aside, docs you quote are irrelevant / incorrect for `Datasets`. – zero323 May 30 '17 at 07:51