0

For some weird reasons I need to get the column names of a dataframe and insert it as the first row(I cannot just import without header). I tried using for comprehension to create a dataframe that only has 1 row and 30 columns(there are 30 headers) and union it to the original dataframe. But what I got is a dataframe with 1 row and only 1 column, with the value being a list of 30 strings.

What I tried:

val headerDF = Seq((for (col <- data.columns) yield col)).toDF
display(headerDF)
Column A
["col1", "col2", "col3", ...]

Expected Behavior:

Column A Column B Column B
col1 col2 Col3
Gaël J
  • 11,274
  • 4
  • 17
  • 32
  • Isn't just the `Seq(..)` useless? Could you split the code on multiple lines and give the types of each variable? Look's like you have a `Ses[Seq[String]]` on which you `.toDF`. in my understanding, you just want a `Seq[String]`. – Gaël J Jul 12 '23 at 05:22

1 Answers1

2

One solution is to use spark.range(1) to create a one-row dataframe and then create one column per column name like this:

// a random dataframe with 4 columns
val df = Seq(("a", "b", "c", "d")).toDF("A", "B", "C", "D")
df.show
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  a|  b|  c|  d|
+---+---+---+---+
val header = spark.range(1).select(df.columns.map(c => lit(c) as c) : _*)
df.union(header).show
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  a|  b|  c|  d|
|  A|  B|  C|  D|
+---+---+---+---+
Oli
  • 9,766
  • 5
  • 25
  • 46