I have 2000 partitions and I'm trying to run the following code snippet:
l = df.rdd.mapPartitionsWithIndex(lambda x,it: [(x,sum(1 for _ in it))]).collect()
Every variation of this code snippet fails with the following: Ordinal must be >= 1
. I have no idea what this means. What do I need to do to reliably print the length of each of my partitions? I'm writing in Python and executing against Spark 2.3.0.