23

I have a query regarding creating multiple spark sessions in one JVM. I have read that creating multiple contexts is not recommended in earlier versions of Spark. Is it true with the SparkSession in Spark 2.0 as well.

I am thinking of making a call to a web service or a servlet from the UI, and the service creates a spark session, performs some operation and returns the result. This will result in a spark session being created for every request from the client side. Is this practice recommended ?

Say I have a method something like :

public void runSpark() throws Exception {

        SparkSession spark = SparkSession
          .builder()
          .master("spark://<masterURL>")
          .appName("JavaWordCount")
          .getOrCreate();

and so on....

If I put this method in a web service , will there be any JVM issues ? As such I am able invoke this method multiple times from a main method.But not sure if this is good practice.

Rishi S
  • 353
  • 1
  • 5
  • 16

4 Answers4

18

The documentation of getOrCreate states

This method first checks whether there is a valid thread-local SparkSession, and if yes, return that one. It then checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default.

There is also the method SparkSession.newSession that indicates

Start a new session with isolated SQL configurations, temporary tables, registered functions are isolated, but sharing the underlying SparkContext and cached data.

So, I guess that the answer to your question is, that you can have multiple sessions, but there is still a single SparkContext per JVM that will be used by all your sessions.

I could imagine, that a possibly scenario for your web application could be to create one SparkSession either per request or, e.g. HTTP session and use this to isolate Spark executions per request or user session <-- Since I'm pretty new to Spark - can someone confirm this ?

Peter Rietzler
  • 471
  • 5
  • 11
  • I've created another question that is closely related to this one. See http://stackoverflow.com/questions/43013542/creating-many-short-living-sparksessions – Peter Rietzler Mar 25 '17 at 07:03
12

If you have an existing spark session and want to create new one, use the newSession method on the existing SparkSession.

import org.apache.spark.sql.{SQLContext, SparkSession}
val newSparkSession = spark.newSession()

The newSession method creates a new spark session with isolated SQL configurations, temporary tables.The new session will share the underlying SparkContext and cached data.

moriarty007
  • 2,054
  • 16
  • 20
10

It is not supported and won't be. SPARK-2243 is resolved as Won't Fix.

If you need multiple contexts there are different projects which can help you (Mist, Livy).

  • 1
    Since the question talks about SparkSessions, it's important to point out that there can be multiple `SparkSession`s running but only a single `SparkContext` per JVM. – Amit Singh Mar 15 '21 at 03:21
  • Also, if we go by [SPARK-26362](https://issues.apache.org/jira/browse/SPARK-26362), this was supported before Spark 3.0.0 but now the support has been removed since it causes arbitrary issues. – Amit Singh Mar 15 '21 at 03:27
  • Not sure why this answer was accepted. The question was about multiple Spark sessions but not about multiple Spark contexts. Apparently, there is a way to create multiple sessions like `val newSparkSession = spark.newSession()` but what is still not clear to me is how to close those sessions since `close()` on the session is an alias of `stop()` which stops the context. – Anton Gorev Jul 31 '23 at 16:57
  • I suspect a session doesn't need to be closed explicitly and will be just removed by the garbage collector when there are no references to it anymore. Probably `clearActiveSession()` should be called to remove it from Spark's thread to the session map. But that is just my guess. Please let me know if someone knows for sure. – Anton Gorev Jul 31 '23 at 17:10
8

You can call getOrCreate multiple times.

This function may be used to get or instantiate a SparkContext and register it as a singleton object. Because we can only have one active SparkContext per JVM, this is useful when applications may wish to share a SparkContext.

getOrCreate creates a SparkContext in JVM if there is no SparkContext available . If SparkContext is already available in JVM it doesn't creates a new but returns the old one.

bob
  • 4,595
  • 2
  • 25
  • 35