how to change a column type in array struct by pyspark

Question

how to change a column type in array struct by pyspark, for example, I would like to change userid from int to long

root
 |-- id: string (nullable = true)
 |-- numbers: array (nullable = true)
 |    |-- element: struct (containsNull = true)
        |-- m1: long (nullable = true)
        |-- m2: long (nullable = true)
        |-- m3: struct (nullable = true)
           |-- userid: integer (nullable = true)

Do you want the results in structured format and then cast it to `long` ? — Dipanjan Mallick, Mar 26 '22 at 04:52
@DKNY yeah, structured format won't be changed and only `userid` changed — Frank, Mar 26 '22 at 08:00
@Frank See my edits. Have used a reproducible example with a schema exactly like yours. Happy to help further if needed — wwnde, Mar 26 '22 at 09:01

wwnde · Accepted Answer · 2022-03-26T08:59:29.790

Would have been useful if you provide a reproducible df as well.

Following you comments below see the following code.

  sch= StructType([StructField('id', StringType(),False),StructField('numbers', ArrayType(
  StructType([StructField('m1',LongType(),True),
              StructField('m2',LongType(),True),
             StructField('m3',StructType([StructField('userid',IntegerType(),True)]),True)])),True)])



df=spark.createDataFrame([
  ('21',[(1234567, 9876543,(1,))]),
  ('34',[(63467892345, 19523789,(2,))])
], schema=sch)
  
  

df.printSchema()

root
 |-- id: string (nullable = false)
 |-- numbers: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- m1: long (nullable = true)
 |    |    |-- m2: long (nullable = true)
 |    |    |-- m3: struct (nullable = true)
 |    |    |    |-- userid: integer (nullable = true)

Solution

df1 = df.selectExpr(
  "id",
  
  "CAST(numbers AS array<struct<m1:long,m2:long, m3:struct<userid:double>>>) numbers"
)

df1.printSchema()

root
 |-- id: string (nullable = false)
 |-- numbers: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- m1: long (nullable = true)
 |    |    |-- m2: long (nullable = true)
 |    |    |-- m3: struct (nullable = true)
 |    |    |    |-- userid: double (nullable = true)

schema is quite complicated, so reproducible schema is not good to me — Frank, Mar 26 '22 at 08:02
Ok, edited to have the exact schema like you had. Happy to assist further if needed — wwnde, Mar 26 '22 at 09:00
@wwnde This looks fab mate. I was trying out by putting everything in a structured fashion and then cast it. Great piece of learning for me today! — Dipanjan Mallick, Mar 26 '22 at 09:11

how to change a column type in array struct by pyspark

1 Answers1