0

I have a dataframe df1 with a column col1 that has structure :

StructField(recipientResource,ArrayType(StructType(List(StructField(resourceId,StringType,true),StructField(type,StringType,true))),true),true)

and another dataframe df2 with col1 that has structure:

StructField(recipientResource,StructType(List(StructField(resourceId,StringType,true),StructField(type,StringType,true))),true)

Inorder to union df1.union(df2), I was trying to cast the column in df2 to convert it from StructType to ArrayType(StructType), however nothing which I tried has worked out.

Can anyone suggest how to go about the same. I'm new to pyspark, any help is appreciated.

Vikas J
  • 358
  • 1
  • 5
  • 17
  • `array>` and `struct<...>` are two completely different objects - you cannot cast one into another. You could add wrapping `array` if that's what you mean, like `select(array(struct_column))`. – Alper t. Turker May 10 '18 at 18:19
  • 1
    An [mcve] with a small sample of your dataframes and the desired output would be helpful. See more on [how to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). – pault May 10 '18 at 18:35

1 Answers1

1

Here is a simple solution using array() function:

Input:

df1 (with ArrayType(StructType()) column):

enter image description here

df2 (with StructType() column):

enter image description here

Code:

df2=(df2
     .withColumn('recipientResource',array(col('recipientResource'))) #convert StructType() column to ArrayType(StructType()) column
    )

Output:

Modified df2:

enter image description here

df3 (output after union of df1 and df2):

enter image description here