8

I'm trying to filter a Spark DataFrame using a list in Java.

java.util.List<Long> selected = ....;
DataFrame result = df.filter(df.col("something").isin(????));

The problem is that isin(...) method accepts Scala Seq or varargs.

Passing in JavaConversions.asScalaBuffer(selected) doesn't work either.

Any ideas?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Boris
  • 443
  • 8
  • 15

2 Answers2

13

Use stream method as follows:

df.filter(col("something").isin(selected.stream().toArray(String[]::new))))
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Shankar
  • 8,529
  • 26
  • 90
  • 159
2

A bit shorter version would be:

df.filter(col("something").isin(selected.toArray()));
Popeye
  • 339
  • 4
  • 14