Spark partition by key

Asked Feb 22 '16 at 11:59

Active Feb 22 '16 at 11:59

Viewed 100 times

What is difference between two types of partitions in Spark?

For example: I load a text file toto.csv from disk to spark cluster

val text = sc.textFile("toto.csv", 100)

=> It split my file into 100 fragments without "rules"

After that, if I do

val partion = text.partitionBy(new HashPartitioner(100))

=> It "split" my file into 100 partition by key

Thanks to any confirmation or suggestion

asked Feb 22 '16 at 11:59

minh-hieu.pham

3

chapter 4 for detailed explanation : https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html – GameOfThrows Feb 22 '16 at 12:07

0 Answers0