3

This question was asked in interview that how many sparkcontexts are allowed to be created per JVM and why? I know only one sparkContext is allowed per jvm but cant understand why? Would anyone please help me understand the reason behind "one sparkcontext per jvm"?

user2017
  • 444
  • 1
  • 4
  • 14

1 Answers1

6

The answer is simple - it has not been designed to work with multiple contexts. Quoting Reynold Xin:

I don't think we currently support multiple SparkContext objects in the same JVM process. There are numerous assumptions in the code base that uses a a shared cache or thread local variables or some global identifiers which prevent us from using multiple SparkContext's.

In a broader sense - single application (with main), single JVM - is standard approach in Java world (Is there one JVM per Java application?, Why have one JVM per application?). Application servers choose different approach, but it is exception, not a rule.

From practical point of view - handling single data intensive application is painful enough (tuning GC, dealing with leaking resources, communication overhead). Mutliple Spark applications running in a single JVM would be impossible to tune and manage in a long run.

Finally there would not be much use of having multiple contexts, as each distributed data structure is tightly connected to its context.