0

I am trying to visualize spark structured streams in Zeppelin. I am able to achieve using memory sink(spark.apache). But it is not reliable solution for high data volumes. What will be the better solution?

Example implementation or demo would be helpful.

Thanks,

Rilwan

dawis11
  • 820
  • 1
  • 9
  • 24
Rilwan
  • 88
  • 11

1 Answers1

0

Thanks for asking the question!! Having 2+ years of experience for developing Spark Monitoring Tools, I think I will be able to resolve your doubt!!

There are two types of processing available when data is coming to spark as stream.

  1. Discretized Stream or DStream: In this mode, spark provides you data in RDD format and you have to write your own logic to handle the RDD.
    Pros:
    1. If you want to do some processing before saving the streaming data, RDD is the best way to handle compared to DataFrame.
    2. DStream provides you a nice Streaming UI where it graphically show how much data havebeen processed. Check this link - https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html#monitoring-applications
    Cons:
    1. Handling Raw RDD is not so convenient and easy.

  2. Structured Stream: In this mode, spark provides you data in a DataFrame format, you need to mention where to store/send the data.
    Pros:
    1. Spark Streaming comes with some predefined sources and sinks which are very common and 95% of real-life scenarios can be resolved by plugging in these. Check this link - https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
    Cons:
    1. There is no Streaming UI available with Structured Streaming :( .Although you can get the metrices and create your own UI. Check this link - https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries
    You can also put store the metrices in some plaintext file, read the file in Zeppelin through spark.read.json, and plot your own graph.

  • Thanks Sourav. But my question is not the difference between Dstream and Structured stream. My query is what is the best way to use Zeppelin for visualizin the data from structured stream – Rilwan Oct 07 '19 at 06:04