1

Been struggling with this for a while and still can't make my mind around it.

I'm trying to flatMap (or use .withColumn with explode() instead as it seems easier so I don't lose column names), but I'm always getting the error UDTF expected 2 aliases but got 'name' instead.

I've revisited some similar questions but none of them shed some light as their schemas are too simple.

The column of the schema I'm trying to perform flatMap with is the following one...

StructField(CarMake,
  StructType(
    List(
      StructField(
        Models,
        MapType(
          StringType,
          StructType(
            List(
              StructField(Variant, StringType),
              StructField(GasOrPetrol, StringType)
            )
          )
        )
      )
    )
  ))

What I'm trying to achieve by calling explode() like this...

carsDS
      .withColumn("modelsAndVariant", explode($"carmake.models"))

...is to achieve a Row without that nested Map and Struct so I get as many rows as variants are.

Example input

(country: Sweden, carMake: Volvo, carMake.Models: {"850": ("T5", "petrol"), "V50": ("T5", "petrol")})

Example output

(country: Sweden, carMake: Volvo, Model: "850", Variant: "T5", GasOrPetrol: "petrol"}
(country: Sweden, carMake: Volvo, Model: "V50", Variant: "T5", GasOrPetrol: "petrol"}

Basically leaving the nested Map with its inner Struct all in the same level.

czr_RR
  • 541
  • 5
  • 16

1 Answers1

1

Try this:

case class Models(variant:String, gasOrPetrol:String)
case class CarMake(brand:String, models : Map[String, Models] )
case class MyRow(carMake:CarMake)

val df = List(
  MyRow(CarMake("volvo",Map(
    "850" -> Models("T5","petrol"),
    "V50" -> Models("T5","petrol")
  )))
).toDF()

df.printSchema()

df.show()

gives

root
 |-- carMake: struct (nullable = true)
 |    |-- brand: string (nullable = true)
 |    |-- models: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: struct (valueContainsNull = true)
 |    |    |    |-- variant: string (nullable = true)
 |    |    |    |-- gasOrPetrol: string (nullable = true)


+--------------------+
|             carMake|
+--------------------+
|[volvo, [850 -> [...|
+--------------------+

now explode, note that withColumn does not work because èxplode on a map returns 2 columns (key and value), so you need to use select:

val cols: Array[Column] = df.columns.map(col)

df
  .select((cols:+explode($"carMake.models")):_*)
  .select((cols:+$"key".as("model"):+$"value.*"):_*)
  .show()

gives:

+--------------------+-----+-------+-----------+
|             carMake|model|variant|gasOrPetrol|
+--------------------+-----+-------+-----------+
|[volvo, [850 -> [...|  850|     T5|     petrol|
|[volvo, [850 -> [...|  V50|     T5|     petrol|
+--------------------+-----+-------+-----------+
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • Thanks a lot for the fast answer! Sorry for not giving code to quick test, I had to leave in a rush and couldn't come back to this until now, really appreciated :) – czr_RR Aug 09 '19 at 20:55