4

Is it possible to run Shark queries over the data contained in the DStreams of a Spark Streaming application? (for istance inside a foreachRDD call)

Are there any specific API to do that?

Thanks.

gprivitera
  • 933
  • 1
  • 8
  • 22
  • I was also looking trying to find whether this is possible. Based on the docs on the website, it doesn't look possible. If you figure something out please let me know. – Pravesh Jain Jun 11 '14 at 18:58
  • 1
    The most similar thing that we could use it's Spark SQL. I think they are rewriting Shark using it so basically will be almost the same thing. https://spark.apache.org/docs/latest/sql-programming-guide.html – gprivitera Jun 11 '14 at 19:01
  • But Spark SQL works over batch data. Might be they will release it over streaming data soon? – Pravesh Jain Jun 11 '14 at 19:03
  • I haven't tried it yet, but I think (and it seems from the docs) that through Spark SQL you can create SchemaRDDs from existing RDDs (so also from DStreams thanks to foreachRDD() function) and run SQL-like queries on them. – gprivitera Jun 11 '14 at 19:05
  • Creating SqlContext and StreamingContext in the same app doesn't seem to be working. It exits as soon as i call the foreachRDD() method of JavaDStream. – Pravesh Jain Jun 12 '14 at 09:39
  • That's really strange, gives you some kind of error? – gprivitera Jun 12 '14 at 14:04
  • Nope, just hangs there. Nothing even printed in the worker logs. I could share the code with you somewhere... – Pravesh Jain Jun 12 '14 at 17:22
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/55523/discussion-between-gprivi-and-pravesh-jain). – gprivitera Jun 12 '14 at 17:24

1 Answers1

0

To answer my question if someone is worried about the same problem: the direct answer to my question is NO, you cannot run Shark directly on Spark Streaming data.

Spark SQL is currently a valid alternative, at least it was for my needs. It is included in Spark and doesn't require more configuration, you can have a look at it here: http://spark.apache.org/docs/latest/sql-programming-guide.html

gprivitera
  • 933
  • 1
  • 8
  • 22