2

I am curious to know, how can i implement sql like exists clause in spark Dataframe way.

Sagar patro
  • 115
  • 2
  • 11
  • Possible duplicate of [Spark replacement for EXISTS and IN](https://stackoverflow.com/questions/34861516/spark-replacement-for-exists-and-in) – pault Jan 08 '20 at 15:35

2 Answers2

4

LEFT SEMI JOIN is equivalent to the EXISTS function in Spark.

val cityDF= Seq(("Delhi","India"),("Kolkata","India"),("Mumbai","India"),("Nairobi","Kenya"),("Colombo","Srilanka")).toDF("City","Country")

df1

val CodeDF= Seq(("011","Delhi"),("022","Mumbai"),("033","Kolkata"),("044","Chennai")).toDF("Code","City")

df2

val finalDF= cityDF.join(CodeDF, cityDF("City") === CodeDF("City"), "left_semi")

df3

venus
  • 1,188
  • 9
  • 18
  • Any other way instead of join operation? Since I have too many joins in my Sql query and I don't want to make it more complex using this join syntax. Any other way will be appreciated – Sagar patro Jan 07 '20 at 11:27
  • sorry bro :(.. this is what I know..post another question with good explanation. – venus Jan 07 '20 at 11:36
1

If the data to be compared is small like a broadcasted list then you can use -

df.filter(col("columnName").isin(list...))

Topde
  • 581
  • 5
  • 12
Salim
  • 2,046
  • 12
  • 13