0

If I load a csv file through spark text file API ,is my RDD partitioned? If yes what is the number? And someone could explain the meaning of default parallelism in Apache Spark.

Alberto Bonsanto
  • 17,556
  • 10
  • 64
  • 93
Abhiram
  • 1
  • 1
  • 2

1 Answers1

0

Alberto Bonsanto's comment links to a post that does describe how partitioning works in Spark.

To answer your question about what the number of partitions are, you can run the following to find out the number of partitions in the RDD.

In python:

rdd = sc.parallelize(xrange(1,10))
print rdd.getNumPartitions()

In scala:

val rdd = sc.parallelize(1 to 100)
println(rdd.partitions.length)

If you have a DataFrame, you can call df.rdd to cast back to the underlying RDD.

MrChristine
  • 1,461
  • 13
  • 13