1

I am attempting to pass a list of parameters to a function.

scala> val a = Array("col1", "col2")
a: Array[String] = Array(col1, col2)

I'm trying to use the :_* notation, but it's not working: and I cannot for the life of me work out why!

val edges = all_edges.select(a:_*)
<console>:27: error: overloaded method value select with alternatives:
(col: String,cols: String*)org.apache.spark.sql.DataFrame <and>
(cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
cannot be applied to (String)

This, however, does work: val edges = all_edges.select("col1", "col2")

Not sure if it is relevant, but all_edges is a spark dataframe which I am attempting to only keep columns by specifying them in a list.

 scala> all_edges
 res4: org.apache.spark.sql.DataFrame

Any ideas? I've been trying to work out the syntax from eg. Passing elements of a List as parameters to a function with variable arguments but don't seem to be getting far

Edit: Just found How to "negative select" columns in spark's dataframe - but I am confused as to why the syntax twocol.select(selectedCols.head, selectedCols.tail: _*) is necessary?

Community
  • 1
  • 1
undershock
  • 754
  • 1
  • 6
  • 26

1 Answers1

10

If you want to pass strings, the signature of the function indicates that you have to pass at least one:

(col: String,cols: String*)org.apache.spark.sql.DataFrame

So you have to single out the first argument of your list : Spark cannot from the type of a Traversable alone determine that it is not empty.

val edges = all_edges.select(a.head, a.tail: _*)

Now, that's the dirty version of it. If you want to do this rigorously, you should check the list is not empty yourself:

val edges = a.headOption.map( (fst) => all_edges.select(fst, a.drop(1))
Francois G
  • 11,957
  • 54
  • 59
  • Thanks - as a general scala question - if the function signature is (col: String,cols: String*), why does only having String still work (there don't seem to be any other overloads? eg. `all_edges.select("col1")?` – undershock Feb 10 '16 at 17:35
  • Because an empty sequence is still a sequence, so that it's acceptable to "interpret" (adapt) `all_edges.select("col1")` as `all_edges.select("col1", Seq(): _*)` – Francois G Feb 10 '16 at 21:11