How do you print the length of each partition in Spark?

Question

I have 2000 partitions and I'm trying to run the following code snippet:

l = df.rdd.mapPartitionsWithIndex(lambda x,it: [(x,sum(1 for _ in it))]).collect()

Every variation of this code snippet fails with the following: Ordinal must be >= 1. I have no idea what this means. What do I need to do to reliably print the length of each of my partitions? I'm writing in Python and executing against Spark 2.3.0.

score 0 · Answer 1 · answered Mar 21 '18 at 04:41

0

use something like this rdd.mapPartitionsWithIndex(lambda x,y: (x, len(list(y))))

answered Mar 21 '18 at 04:41

pauli

4,191
2
25
41

How do you print the length of each partition in Spark?

1 Answers1