0

Im creating structure of nested elements . Im having difficulties about how to create a nested struct with tests. Have a look on my code below. I want to add struct:

(if x2.fields3 == 1 then create struct_1, if x2.fields3==2 create struct_2)

df = df.withColumn("General",
               struct(
                   col("x1.field1").alias("gen1"),
                   col("x1.field2").alias("gen2"),

                   struct(col("x1.field1").alias("gen3.1"),
                   col("x2.field1").alias("gen3.2"),
                   col("x1.field4").alias("gen3.3"),
                   col("x2.field4").alias("gen3.4"),
                   col("x1.field5").alias("gen3.5"),
                   col("x1.field3").alias("gen3.6"),
                   struct(struct(lit('AA').alias("gen3.7.1.1"),
                    lit("BB").alias("gen3.7.1.2")).alias("gen3.7.1")

                         #Add new struct with test 


                         ).alias("gen3.7")).alias("gen3")

                    )).drop('x1','x2')
blackbishop
  • 30,945
  • 11
  • 55
  • 76
over
  • 1
  • 3

1 Answers1

0

Simply use when with condition:

when(col("x2.fields3") == lit(1), struct(...).alias("struct_1"))\
.when(col("x2.fields3") == lit(2), struct(...).alias("struct_2"))
blackbishop
  • 30,945
  • 11
  • 55
  • 76
  • thank you for your help. I didn't undersnat what can I put in the struct(....). – over Jan 02 '20 at 17:03
  • As you didn't specify what you want in the structs, I just put `...`, just add the columns or literals you want to the structs ;-) – blackbishop Jan 02 '20 at 17:17
  • (when(col("x2.field3") == lit(1), struct(col("x1.field1").alias("index2")).alias("struct_1"))) (when(col("x2.field3") == lit(2), struct(col("x1.field1").alias("Index1")).alias("struct_2"))) – over Jan 03 '20 at 12:14
  • Hi @over, see [this](https://stackoverflow.com/a/36332079/1386551) for how to check if column exists. – blackbishop Jan 03 '20 at 12:59
  • Thank you so much but I got error when I tried to run your code lines . – over Jan 03 '20 at 13:09
  • Py4JJavaError: An error occurred while calling o13564.withColumn. : org.apache.spark.sql.AnalysisException: cannot resolve 'CASE WHEN (CAST(`x2`.`field3` AS INT) = 1) THEN named_struct('index2', `x1`.`field1`) WHEN (CAST(`x2`.`field3` AS INT) = 2) THEN nam – over Jan 03 '20 at 13:10
  • your link is using Scala @blachbishop. and as you know am using pyspark – over Jan 03 '20 at 13:15
  • Below the scala code there is also a python equivalent, please read the answer entirely. – blackbishop Jan 03 '20 at 13:18
  • do you know please why I have this error?@blackbishop – over Jan 03 '20 at 13:25