I'm looking for the Pyspark equivalent to this question: How to get the number of elements in partition?.
Specifically, I want to programmatically count the number of elements in each partition of a pyspark RDD or dataframe (I know this information is available in the Spark Web UI).
This attempt:
df.foreachPartition(lambda iter: sum(1 for _ in iter))
results in:
AttributeError: 'NoneType' object has no attribute '_jvm'
I do not want to collect the contents of the iterator into memory.