How determine that which partition a key to go to in sort shuffle of Spark

Question

In the sort % spill process, which key is the start of a partition and anther?

Is this Spark Core or SQL? Can you show the exact code snippet you think about to make sure we're talking about the same things? — Jacek Laskowski, May 30 '17 at 06:48
Possible duplicate of [How does HashPartitioner work?](https://stackoverflow.com/questions/31424396/how-does-hashpartitioner-work) — zero323, May 30 '17 at 07:47
It is Spark Core. As the [figure](https://0x0fff.com/wp-content/uploads/2015/08/spark_hash_shuffle_with_consolidation.png) shown. Thanks. @Jacek Laskowski — CCong, May 30 '17 at 12:14

score 0 · Answer 1 · answered May 30 '17 at 06:49

0

Regardless of whether it's Spark Core (with RDDs) or Spark SQL (with Datasets), the default partitioner is HashPartitioner where the hash of a key gives the partition:

A org.apache.spark.Partitioner that implements hash-based partitioning using Java's Object.hashCode.

answered May 30 '17 at 06:49

Jacek Laskowski

72,696
27
242
420

I beg to disagree :) – zero323 May 30 '17 at 07:46
Let's have this conversation after I got all the needed info from the OP :) If it's Spark SQL that answer might get tricker (and won't be a duplicate). – Jacek Laskowski May 30 '17 at 07:47
Well, details aside, docs you quote are irrelevant / incorrect for `Datasets`. – zero323 May 30 '17 at 07:51

How determine that which partition a key to go to in sort shuffle of Spark

1 Answers1