I have started to explore Spark Structured Streaming to write some applications having been using DStreams before this.
I am trying to understand the limitations of Structured Streaming as I have started to use it but would like to know the draw backs if any.
Q1. For each sink in the structured streaming app, it will independently read from a source (eg. Kafka). Meaning if you read from one topic A, and write to 3 places (e.g. ES, Kafka, S3) it will actually set up 3 source connections independent of each other.
Will this be a performance degradation? As it will require 3 independent connections managed instead of one (DStream approach)
Q2. I know that joining 2 streaming data sets is unsupported. How can I perform calculations on 2 streams?
If I have data from topic A and other data from topic B, is it possible to do calculations on both of these somehow?
Q3. In Spark Streaming UI, there is a Streaming tab for metrics and to view the throughput of the application. In Structured Streaming this is not available anymore.
Why is this? Is the intention to obtain all metrics programmatically and push to a separate monitoring service?