-1

I have 2000 partitions and I'm trying to run the following code snippet:

l = df.rdd.mapPartitionsWithIndex(lambda x,it: [(x,sum(1 for _ in it))]).collect()

Every variation of this code snippet fails with the following: Ordinal must be >= 1. I have no idea what this means. What do I need to do to reliably print the length of each of my partitions? I'm writing in Python and executing against Spark 2.3.0.

Sean Lindo
  • 1,387
  • 16
  • 33

1 Answers1

0

use something like this rdd.mapPartitionsWithIndex(lambda x,y: (x, len(list(y))))

pauli
  • 4,191
  • 2
  • 25
  • 41