I'm using MLlib with Python(Pyspark) and would like to get to know the number of RDD's getting created in memory before the execution of my code. I'm performing Transactions and Actions on RDD's. So would just like to get to know the total number of RDD's that created in Memory.
Asked
Active
Viewed 166 times
2 Answers
0
number of RDD's depends on your program.
But I think here you want to know number of partitions an RDD is created on :
for that you can use : rdd.getNumPartitions()
refer : Show partitions on a pyspark RDD
Upvote if works

Ajinkya Bhore
- 144
- 1
- 1
- 12
0
First of all As you asked Number of RDD's . That depends how you write your application code. There can be 1 or more than 1 RDD in you application.
Though you can find the number of partitions in an RDD.
for scala
someRDD.partitions.size
Pyspark
someRDD.getNumPartitions()
If there are more than 1 rdd in you application you can count partitions of each RDD and sum them that will be the total number of partitions..

Strick
- 1,512
- 9
- 15