* in pyspark list comprehension

Asked Jun 28 '17 at 07:17

Active Jun 28 '17 at 16:10

Viewed 32 times

I'm now reading Learning PySpark, and in the book, the author first creates a dataframe:

df_miss = spark.createDataFrame([(1, 143.5, 5.6, 28, 'M', 100000), 
                             (2, 167.2, 5.4, 45, 'M', None), 
                             (3, None , 5.2, None, None, None), 
                             (4, 144.5, 5.9, 33, 'M', None), 
                             (5, 133.2, 5.7, 54, 'F', None), 
                             (6, 124.1, 5.2, None, 'F', None),
                             (7, 129.2, 5.3, 42, 'M', 76000), ], 
                            [' id', 'weight', 'height', 'age','gender', 'income'])

then he uses this method to calculate the percentage of missing value:

df_miss.agg(*[(1 - (fn.count( c) / fn.count('*'))).alias( c + '_missing') 
              for c in df_miss.columns ]).show()

What are the two * for, especially the second one? Are there any resources about this kind of expression?? Thanks a lot!

edited Jun 28 '17 at 16:10

Daniel

asked Jun 28 '17 at 07:17

Peng Dong

This is not directly related to pyspark. https://stackoverflow.com/questions/12786102/unpacking-function-argument – DeepSpace Jun 28 '17 at 07:22
also check this, https://stackoverflow.com/questions/36901/what-does-double-star-and-star-do-for-parameters – Leonard2 Jun 28 '17 at 07:24
Thank you so much for these resources! I've been stuck here for a long time – Peng Dong Jun 30 '17 at 09:04

* in pyspark list comprehension

0 Answers0