Basically, I want to do the following, but without the for-loop:
from pyspark.sql import functions as F
for myCol in df.schema.names:
uniqs[myCol] = df.groupby("colX").agg(F.countDistinct(myCol)).collect()
I tried
uniqs = df.groupby("colX").agg(F.countDistinct(*df.schema.names)).collect()
but that does something else. The reason why I want to avoid the for-loop is that this way, the groupby operation is done n times instead of just the one time, incurring heavy overhead...
I'm on Spark 1.6.2.