1

Starting with a dataframe:

val someDF = Seq(
  (8, "bat", "h"),
  (64, "mouse", "t"),
  (-27, "horse", "x")
).toDF("number", "thing", "letter")

someDF.show()

+------+-----+------+
|number|thing|letter|
+------+-----+------+
|     8|  bat|     h|
|    64|mouse|     t|
|   -27|horse|     x|
+------+-----+------+

and a Map:

val lookup = Map(
  "number" -> "id",
  "thing" -> "animal"
)

I'd like to select and rename the columns such that number becomes id, thing becomes animal and so on.

The renaming is covered in another Stack Overflow question: Renaming column names of a DataFrame in Spark Scala, I'm sure there is a straightforward way to do the select at the same time that I'm not seeing.

I thought something along these lines would work, but get lots of type mismatches despite the input is a string and it works with a Seq instead of map:

val renamed_selected = someDF.select(
      lookup.map(m => col(m._1).as(m._2))
    ):_*

So the desired output is:

+------+------+
|id    |animal|
+------+------+
|     8|  bat |     
|    64|mouse |     
|   -27|horse |     
+------+------+

Thanks

Clarification on duplicate question flag: The question Renaming column names of a DataFrame in Spark Scala does not cover how to rename and select columns at the same time.

wab
  • 797
  • 6
  • 19
  • Clarification on duplicate question flag: The question https://stackoverflow.com/questions/35592917/renaming-column-names-of-a-dataframe-in-spark-scala does not cover how to rename and select columns at the same time. – wab Oct 29 '18 at 15:29

1 Answers1

3

Here is one way; Use pattern matching to check whether the name exists in the lookup, and give the column an alias if it does otherwise use the original name:

val cols = someDF.columns.map(name => lookup.get(name) match { 
  case Some(newname) => col(name).as(newname) 
  case None => col(name) 
})

someDF.select(cols: _*).show
+---+------+------+
| id|animal|letter|
+---+------+------+
|  8|   bat|     h|
| 64| mouse|     t|
|-27| horse|     x|
+---+------+------+

If you only need columns in the lookup:

val cols = someDF.columns.collect(name => lookup.get(name) match { 
  case Some(newname) => col(name).as(newname) 
})

someDF.select(cols: _*).show
+---+------+
| id|animal|
+---+------+
|  8|   bat|
| 64| mouse|
|-27| horse|
+---+------+
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks @Psidom - Realised my question wasn't 100% clear. I'd also like to drop the columns that are not in the map. Would `case None => None` cover that? – wab Oct 24 '18 at 15:59
  • 1
    If you only need columns in the `lookup`, you can use `collect` which will drop columns that don't match instead of `map`. See the update - – Psidom Oct 24 '18 at 16:03
  • This works but I get a type checking error - any idea how to specify the correct types? `Type mismatch, expected: PartialFunction[String, NotInferedB], actual: Nothing => Column` – wab Oct 29 '18 at 14:57
  • Figured it out: ` val x: PartialFunction[String, Column] = { name: String => lookup.get(name) match { case Some(newname) => col(name).as(newname): Column } } val cols = dataOut.columns.collect{x}` – wab Oct 29 '18 at 17:17