This question was asked in interview that how many sparkcontexts are allowed to be created per JVM and why? I know only one sparkContext is allowed per jvm but cant understand why? Would anyone please help me understand the reason behind "one sparkcontext per jvm"?
1 Answers
The answer is simple - it has not been designed to work with multiple contexts. Quoting Reynold Xin:
I don't think we currently support multiple SparkContext objects in the same JVM process. There are numerous assumptions in the code base that uses a a shared cache or thread local variables or some global identifiers which prevent us from using multiple SparkContext's.
In a broader sense - single application (with main
), single JVM - is standard approach in Java world (Is there one JVM per Java application?, Why have one JVM per application?). Application servers choose different approach, but it is exception, not a rule.
From practical point of view - handling single data intensive application is painful enough (tuning GC, dealing with leaking resources, communication overhead). Mutliple Spark applications running in a single JVM would be impossible to tune and manage in a long run.
Finally there would not be much use of having multiple contexts, as each distributed data structure is tightly connected to its context.

- 76
- 1