Apache spark shell context: how do you set the number of partitions when using the shell: it is not clear in the doc I am reviewing. Is just the default 2 partitions?
Asked
Active
Viewed 1,457 times
0
-
The number of partitions for what? JOINing, saving output? – thebluephantom Sep 04 '18 at 23:31
-
processing and transforming in parallel a large dataset. The default in standalone is number of cores. – Sep 08 '18 at 22:33
-
The answer below concurs with my comment. Think you may need to redefine as it could be considered too broad. – thebluephantom Sep 09 '18 at 02:39
1 Answers
0
But number of partitions for what? There are many different parameters in Spark (i.e for shuffling spark.sql.shuffle.partitions, spark.default.parallelism when you do transformation with RDDs) Also you can change number of partition for Dateset/Datafrem with COALESCE/REPARTITION etc...
There is also different default number of partitions for datasets when you work on your local PC or on hadoop cluster.
You need to specify what exactly you need to set for partitions?
Here are some good links, that could clarify your question more:
How does Spark partition(ing) work on files in HDFS?
Spark Partitions: Loading a file from the local file system on a Single Node Cluster

Tomasz Krol
- 596
- 6
- 23
-
I have seen the default is the number of cores of the machine when working in standalone. I mean partitions for a map reduce operation. – Sep 04 '18 at 23:04