10

For example,

val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns)

How can I set column name using string Array? Is it possible to mention data types inside toDF()?

Devi
  • 187
  • 1
  • 1
  • 8

3 Answers3

14

toDF() takes a repeated parameter of type String, so you can use the _* type annotation to pass a sequence:

val df=sc.parallelize(Seq(
  (1,"example1", Seq(0,2,5)),
  (2,"example2", Seq(1,20,5)))).toDF(columns: _*)

For more on repeated parameters - see section 4.6.2 in the Scala Language Specification.

Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85
8
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF("column1", "column2", "column3")

toDF() takes comma-seperated strings

anshul_cached
  • 684
  • 5
  • 18
6

toDF() is defined in Spark documentation as:

def toDF(colNames: String*): DataFrame

And so you need to turn your array to a varargs as also described here. That means you need to do the following:

val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns: _*)

(Add : _* tocolumns in toDF)

Community
  • 1
  • 1
shakedzy
  • 2,853
  • 5
  • 32
  • 62