I am new to spark streaming. I am trying to do some exercises on fetching data from kafka and joining with hive table.i am not sure how to do JOIN in spark streaming (not the structured streaming). Here is my code
val ssc = new StreamingContext("local[*]", "KafkaExample", Seconds(1))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "dofff2.dl.uk.feefr.com:8002",
"security.protocol" -> "SASL_PLAINTEXT",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "1",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("csvstream")
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
val strmk = stream.map(record => (record.value,record.timestamp))
Now i want to do join on one of the table in hive. In spark structured streaming i can directly call spark.table("table nanme") and do some join, but in spark streaming how can i do it since its everything based on RDD. can some one help me ?