I'm new to Spark/Shark and have spun up a cluster with three Spark workers. I started installing Shark on the same three servers but I'm coming to the conclusion that maybe that's not needed and only one Shark server is necessary -- I can't find anything that speaks to this in the documentation. Do I only need one Shark server since Spark/Hive will be doing the heavily lifting, or do I need to distribute it to all the servers where Spark resides?
Asked
Active
Viewed 142 times
2
-
Your question isn't really clear, what you want to do exactly with Shark? Having it installed only on one server instead of three means it will have roughly 1/3 of computation of power. – gprivitera Jun 27 '14 at 22:57
2 Answers
0
Shark is a Spark application. It is just like a WordCount or Spark Shell. You need to have it on a client machine from which you are going to send queries.
If Shark JARS are not present on the worker machines, they have to be attached to the Spark Context.
Shark server works a little bit like a 'screen' in unix systems. In this case, Shark server is an application in Spark. You connect to Shark server with Shark console, send the queries, and the queries are executed by Shark server on Spark on you behalf.

Jacek L.
- 1,376
- 14
- 19
0
Assuming that by Shark you mean the ThriftServer, then you only need one Shark per (Spark) cluster.
This carries over even to Spark 1.0.1 where Shark is retired because the ThriftServer has been brought into the Spark core itself.

overcoil
- 170
- 5