I was reading the official documentation of PySpark API reference for dataframe and below code snippet for transform function over a dataframe have me confused. I can't figure out why * is placed before sorted function in sort_columns_asc function defined below
from pyspark.sql.functions import col
df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
def cast_all_to_int(input_df):
return input_df.select([col(col_name).cast("int") for col_name in input_df.columns])
def sort_columns_asc(input_df):
return input_df.select(*sorted(input_df.columns))
df.transform(cast_all_to_int).transform(sort_columns_asc).show()
+-----+---+
|float|int|
+-----+---+
| 1| 1|
| 2| 2|
+-----+---+
Please help me clarify the confusion.