Considering data from both the topics are joined at one point and sent to Kafka sink finally which is the best way to read from multiple topics
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", servers)
.option("subscribe", "t1,t2")
vs
val df1 = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", servers)
.option("subscribe", "t1")
val df2 = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", servers)
.option("subscribe", "t2")
Somewhere i will df1.join(df2)
and send it to Kafka sink.
With respect to performance and resource usage wise which would be the better option here?
Thanks in advance
PS : I see another similar question Spark structured streaming app reading from multiple Kafka topics but there dataframes from 2 topics seems to be not used together