-1

I have a DataFrame with this structure:

root
 |-- features: struct (nullable = true)
 |    |-- value: double (nullable = true)

and I wanna convert value with double type to "values with array" type. How can I do that?

mck
  • 40,932
  • 13
  • 35
  • 50

1 Answers1

0

You can specify the conversion explicitly using struct and array:

import pyspark.sql.functions as F

df.printSchema()
#root
# |-- features: struct (nullable = false)
# |    |-- value: double (nullable = false)

df2 = df.withColumn(
    'features',
    F.struct(
        F.array(F.col('features')['value']).alias('values')
    )
)

df2.printSchema()
#root
# |-- features: struct (nullable = false)
# |    |-- values: array (nullable = false)
# |    |    |-- element: double (containsNull = false)
mck
  • 40,932
  • 13
  • 35
  • 50