4

After joining two dataframes, I find that the column order has changed what I supposed it would be.

Ex: Joining two data frames with columns [b,c,d,e] and [a,b] on b yields a column order of [b,a,c,d,e].

How can I change the order of the columns (e.g., [a,b,c,d,e])? I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?

jest jest
  • 125
  • 1
  • 2
  • 8

2 Answers2

8

In Scala you can use the "splat" (:_*) syntax to pass a variable length list of columns to the DataFrame.select() method.

To address your example, you can get a list of the existing columns via DataFrame.columns, which returns an array of strings. Then just sort that array and convert the values to columns. You can then "splat" out to the select() method:

val mySortedCols = myDF.columns.sorted.map(str => col(str))
// Array[String]=(b,a,c,d,e) => Array[Column]=(a,b,c,d,e)

val myNewDF = myDF.select(mySortedCols:_*)
Community
  • 1
  • 1
chucknelson
  • 2,328
  • 3
  • 24
  • 31
2

One way of doing it is reordering after your join:

case class Person(name : String, age: Int)
val persons = Seq(Person("test", 10)).toDF

persons.show
+----+---+
|name|age|
+----+---+
|test| 10|
+----+---+

persons.select("age", "name").show

+---+----+
|age|name|
+---+----+
| 10|test|
+---+----+
Kestemont Max
  • 1,302
  • 2
  • 8
  • 10
  • Once a data frame becomes unwieldy in its number of columns and the order is more than a single swap or two, what other ways are there? My guess is something to do with `columns()`(Java API) ... – jest jest Jun 28 '16 at 22:16