If I load a csv
file through spark text file API ,is my RDD
partitioned?
If yes what is the number? And someone could explain the meaning of default parallelism in Apache Spark.
Asked
Active
Viewed 2,611 times
0

Alberto Bonsanto
- 17,556
- 10
- 64
- 93

Abhiram
- 1
- 1
- 2
1 Answers
0
Alberto Bonsanto's comment links to a post that does describe how partitioning works in Spark.
To answer your question about what the number of partitions are, you can run the following to find out the number of partitions in the RDD.
In python:
rdd = sc.parallelize(xrange(1,10))
print rdd.getNumPartitions()
In scala:
val rdd = sc.parallelize(1 to 100)
println(rdd.partitions.length)
If you have a DataFrame, you can call df.rdd
to cast back to the underlying RDD.

MrChristine
- 1,461
- 13
- 13