0

I would like to test a user input against a whitelist of available types of join of Spark.

Is there a way to know the different join types with a spark built in?

For instance, I would like to validate user's input against this Seq Seq("inner", "cross", "outer", "full", "fullouter", "left", "leftouter", "right", "rightouter", "leftsemi", "leftanti")

(Which are all join types available in Spark) Without hardcoding it as I have just done.

BlueSheepToken
  • 5,751
  • 3
  • 17
  • 42

2 Answers2

3

I adapted the answer from this question here. You can also add the joinTypes in Json file to read in runtume. You can check this answer for json object handling JsonParsing

Update 1: I update the answer to follow Spark documentation way JoinType

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._


object SparkSandbox extends App {

  case class Row(id: Int, value: String)

  private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()

  import spark.implicits._

  spark.sparkContext.setLogLevel("ERROR")

  val r1 = Seq(Row(1, "A1"), Row(2, "A2"), Row(3, "A3"), Row(4, "A4")).toDS()
  val r2 = Seq(Row(3, "A3"), Row(4, "A4"), Row(4, "A4_1"), Row(5, "A5"), Row(6, "A6")).toDS()
  val validUserJoinType = "inner"
  val inValiedUserJoinType = "nothing"

  val joinTypes = Seq("inner", "outer", "full", "full_outer", "left", "left_outer", "right", "right_outer", "left_semi", "left_anti")

  inValiedUserJoinType match {
    case x => if (joinTypes.contains(x)) {
      println("do some logic")
      joinTypes foreach { joinType =>
        println(s"${joinType.toUpperCase()} JOIN")
        r1.join(right = r2, usingColumns = Seq("id"), joinType = joinType).orderBy("id").show()
      }
    }
    case _ =>
  val supported = Seq(
    "inner",
    "outer", "full", "fullouter", "full_outer",
    "leftouter", "left", "left_outer",
    "rightouter", "right", "right_outer",
    "leftsemi", "left_semi",
    "leftanti", "left_anti",
    "cross")

  throw new IllegalArgumentException(s"Unsupported join type '$inValiedUserJoinType'. " +
  "Supported join types include: " + supported.mkString("'", "', '", "'") + ".")
  }

}
Moustafa Mahmoud
  • 1,540
  • 13
  • 35
  • Thank you very much for your answer, but actually I am more looking for a built in function, the types of joins are still hardcoded, not in the code anymore but in a JSon. Do you know if there is built in for this? – BlueSheepToken Jan 04 '19 at 12:52
  • But if you check in spark documentation it is hard coded they didn't add something like case class or types https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala – Moustafa Mahmoud Jan 04 '19 at 13:01
  • 1
    https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala Apparently you are right, this is hardcoded in the apply method :( – BlueSheepToken Jan 04 '19 at 13:20
  • I updated the answer to follow same way for spark handling way. – Moustafa Mahmoud Jan 04 '19 at 13:26
2

Sorry this is not possible without a PR into the Spark project itself. The join types are defined inline at JoinType. There are classes that extend JoinType but the naming convention is different to that of the strings used in the case statement. So you're out of luck I'm afraid.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala

David
  • 755
  • 5
  • 11