We create a system consisting of multiple Spark Streaming applications with each applications having multiple receivers. As far as I understood, each receivers needs its own core in the cluster. We need multiple receivers to accommodate peaks but we don't need them all the time. The applications are quite small doing only one task in order to (re)submit them on the cluster without distracting the other jobs & tasks.
1) Assuming we have 5 jobs with 5 receivers each we would need at least 25 cores in the cluster only for the receivers to run + the cores for the processing. Is this right?
2) Is there a possibility to do a more dynamic resource allocation or is one core strictly bound to one receiver?
3) I took a look onto the spark-rest-server which offers the possibility to share spark context over different jobs. Could you think of having one SparkStreamingContext for all (~100) Jobs?
We are running the cluster in standalone mode together with a Cassandra cluster on the same nodes.