1

I have a data frame. Which is like this -

 |-- Col1 : string (nullable = true)
 |-- Col2 : string (nullable = true)
 |-- Col3 : struct (nullable = true)
 |    |-- 513: long (nullable = true)
 |    |-- 549: long (nullable = true)

by using-

df.select("Col1","Col2","Col3.*").show

+-----------+--------+------+------+
|       Col1|    Col1|   513|   549|
+-----------+--------+------+------+
| AAAAAAAAA |  BBBBB |    39|    38|
+-----------+--------+------+------+

Now I want to rename it

    +-----------+--------+---------+--------+
    |       Col1|    Col1| Col3=513|Col3=549|
    +-----------+--------+---------+--------+
    | AAAAAAAAA |  BBBBB |       39|      38|
    +-----------+--------+---------+--------+

Columns inside struct is dynamic. So I can't use withColumnRenamed

lucy
  • 4,136
  • 5
  • 30
  • 47

1 Answers1

1

As you ask about renaming insude structs, you can achieve this using Schema DSL:

import org.apache.spark.sql.types._

val schema: StructType = df.schema.fields.find(_.name=="Col3").get.dataType.asInstanceOf[StructType]
val newSchema = StructType.apply(schema.fields.map(sf => StructField.apply("Col3="+sf.name,sf.dataType)))

df
  .withColumn("Col3",$"Col3".cast(newSchema))
  .printSchema()

gives

root
 |-- Col1: string (nullable = true)
 |-- Col2: string (nullable = true)
 |-- Col3: struct (nullable = false)
 |    |-- Col3=513: long (nullable = true)
 |    |-- Col3=549: long (nullable = true)

Then you can unpack it using select($"col3.*").

You could also unpack the struct first and then rename all the columns which have an number as column name...

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145