0

Apache spark shell context: how do you set the number of partitions when using the shell: it is not clear in the doc I am reviewing. Is just the default 2 partitions?

1 Answers1

0

But number of partitions for what? There are many different parameters in Spark (i.e for shuffling spark.sql.shuffle.partitions, spark.default.parallelism when you do transformation with RDDs) Also you can change number of partition for Dateset/Datafrem with COALESCE/REPARTITION etc...

There is also different default number of partitions for datasets when you work on your local PC or on hadoop cluster.

You need to specify what exactly you need to set for partitions?

Here are some good links, that could clarify your question more:

How does Spark partition(ing) work on files in HDFS?

Spark Partitions: Loading a file from the local file system on a Single Node Cluster

Tomasz Krol
  • 596
  • 6
  • 23
  • I have seen the default is the number of cores of the machine when working in standalone. I mean partitions for a map reduce operation. –  Sep 04 '18 at 23:04