0

I was reading the official documentation of PySpark API reference for dataframe and below code snippet for transform function over a dataframe have me confused. I can't figure out why * is placed before sorted function in sort_columns_asc function defined below

from pyspark.sql.functions import col
df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
def cast_all_to_int(input_df):
    return input_df.select([col(col_name).cast("int") for col_name in input_df.columns])
def sort_columns_asc(input_df):
    return input_df.select(*sorted(input_df.columns))
df.transform(cast_all_to_int).transform(sort_columns_asc).show()
+-----+---+
|float|int|
+-----+---+
|    1|  1|
|    2|  2|
+-----+---+

Please help me clarify the confusion.

isilia
  • 379
  • 1
  • 11
  • 1
    Does this answer your question?https://stackoverflow.com/questions/2921847/what-does-the-star-and-doublestar-operator-mean-in-a-function-call – 过过招 Feb 15 '22 at 04:37
  • the aforementioned link will absolutely help you understand it. helpful note - before you step there, just understand what would that `df.columns` and `sorted()` result in. – samkart Feb 15 '22 at 05:25
  • Please accept the answer if it solved your problem – JAdel Feb 18 '22 at 15:40

1 Answers1

1

It's used to unpack arrays/collections from a higher dimension.

# 1D Array
collection1 = [1,2,3,4]
print(*collection1)
1 2 3 4

# 2D Array
collection2 = [[1,2,3,4]]
print(*collection2)
[1, 2, 3, 4]

In your example you are unpacking the names of the column names from

example = ["int", "float"]

to

print(*sorted(example))
float int

Check out this for further information.

JAdel
  • 1,309
  • 1
  • 7
  • 24