3

Using:

  • Apache Spark 2.0.1
  • Java 7

On the Apache Spark Java API documentation for the class DataSet appears an example to use the method join using a scala.collection.Seq parameter to specify the columns names. But I'm not able to use it. On the documentation they provide the following example:

df1.join(df2, Seq("user_id", "user_name"))

Error: Can not find Symbol Method Seq(String)

My Code:

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import scala.collection.Seq;

public class UserProfiles {

public static void calcTopShopLookup() {
    Dataset<Row> udp = Spark.getDataFrameFromMySQL("my_schema","table_1");

    Dataset<Row> result = Spark.getSparkSession().table("table_2").join(udp,Seq("col_1","col_2"));
}
Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85

1 Answers1

4

Seq(x, y, ...) is a Scala way to create sequence. Seq has it's companion object, which has apply method, which allows to not write new each time.

It should be possible to write:

import scala.collection.JavaConversions;
import scala.collection.Seq;

import static java.util.Arrays.asList;

Dataset<Row> result = Spark.getSparkSession().table("table_2").join(udp, JavaConversions.asScalaBuffer(asList("col_1","col_2")));`

Or you can create own small method:

 public static <T> Seq<T> asSeq(T... values) {
        return JavaConversions.asScalaBuffer(asList(values));
    }
T. Gawęda
  • 15,706
  • 4
  • 46
  • 61
  • @TzachZohar Yes, my mistake, I forgot that using companion object is not so easy ;) Please see edit – T. Gawęda Nov 22 '16 at 12:36
  • @TzachZohar worth noting that it works only if you import `scala.collection.immutable.Seq`, not `mutable` or the `scala.collection.Seq`. – Łukasz Nov 22 '16 at 12:39
  • @TzachZohar which version of Scala you're using? In my version, 2.11, I cannot do `new Seq("vaue")` as Seq is abstract - both `scala.collection.Seq` and immutable version – T. Gawęda Nov 22 '16 at 12:44
  • @T.Gawęda thank you very much!!! it works with your first advice, the second one from **TzachZohar** doesn't works, or at least, is not that easy cause the Seq class is abstract, and I don't want to implement all those methods. – José Carlos Guevara Turruelles Nov 22 '16 at 12:49
  • Credits also for @Łukasz, he wrote his answer in the same time as I wrote my edit (first version was bad, I forgot one thing as he helped me). Thanks :) – T. Gawęda Nov 22 '16 at 12:53