0

In the sort % spill process, which key is the start of a partition and anther?

zero323
  • 322,348
  • 103
  • 959
  • 935
CCong
  • 47
  • 3
  • Is this Spark Core or SQL? Can you show the exact code snippet you think about to make sure we're talking about the same things? – Jacek Laskowski May 30 '17 at 06:48
  • Possible duplicate of [How does HashPartitioner work?](https://stackoverflow.com/questions/31424396/how-does-hashpartitioner-work) – zero323 May 30 '17 at 07:47
  • It is Spark Core. As the [figure](https://0x0fff.com/wp-content/uploads/2015/08/spark_hash_shuffle_with_consolidation.png) shown. Thanks. @Jacek Laskowski – CCong May 30 '17 at 12:14

1 Answers1

0

Regardless of whether it's Spark Core (with RDDs) or Spark SQL (with Datasets), the default partitioner is HashPartitioner where the hash of a key gives the partition:

A org.apache.spark.Partitioner that implements hash-based partitioning using Java's Object.hashCode.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420