40

I have a Spark application which using Spark 2.0 new API with SparkSession. I am building this application on top of the another application which is using SparkContext. I would like to pass SparkContext to my application and initialize SparkSession using existing SparkContext.

However I could not find a way how to do that. I found that SparkSession constructor with SparkContext is private so I can't initialize it in that way and builder does not offer any setSparkContext method. Do you think there exist some workaround?

Stefan Repcek
  • 2,553
  • 4
  • 21
  • 29
  • I'm not very sure but according to my knowledge ter is no workaround – Balaji Reddy Mar 21 '17 at 18:22
  • yea :( so If there is no workaround there are two options left: using SparkContext in my application or add support for sparkSession to application I am building on the top (it is spark-jobserver, I am using their branch spark-2.0-preview however they still use sparkContext) – Stefan Repcek Mar 21 '17 at 18:30
  • You only need to add support for an external SparkContext to the application and access the session.sparkContext. Shouldn't be a big issue. – matfax Mar 21 '17 at 22:21
  • can you explain more by what you mean "add support for an external SparkContext" I read you should use just one instance of sparkcontext – Stefan Repcek Mar 21 '17 at 23:31
  • I suppose the application creates its own SparkContext. Since you only want one SparkContext (for good reasons), you need to add a parameter to the application's constructor or builder that accepts the external SparkContext that you already created using the session builder. – matfax Mar 22 '17 at 01:10
  • the problem is the application I am using (spark-jobserver) don't allow to pass my sparkContext, it creates its own – Stefan Repcek Mar 22 '17 at 11:03
  • That's why you need to edit the code of spark-jobserver (the application) not to create its own. Fork it, make your modifications, and publish it (e.g., with Jitpack). As Balaji said, there is no workaround. The only alternative is to edit Spark itself, which I wouldn't recommend. – matfax Mar 22 '17 at 12:12
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/138731/discussion-between-matthias-fax-and-stevesk). – matfax Mar 22 '17 at 12:27

5 Answers5

34

Deriving the SparkSession object out of SparkContext or even SparkConf is easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4 but this should work in the older 2.x releases as well):

// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

Hope that helps!

Rishabh
  • 1,901
  • 2
  • 19
  • 18
21

Like in the above example you cannot create because SparkSession's constructor is private Instead you can create a SQLContext using the SparkContext, and later get the sparksession from the sqlcontext like this

val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession

Hope this helps

philantrovert
  • 9,904
  • 3
  • 37
  • 61
Partha Sarathy
  • 219
  • 2
  • 4
  • 4
    When I do this in Spark 2.2, it says SQLContext is deprecated and to use SparkSession.Builder() instead – covfefe Mar 14 '18 at 22:35
  • Correct. In Spark 2, SQLContext is deprecated because everything is consolidated to the SparkSession, which is why you'd just use `SparkSession.sql()` to execute your Spark SQL, `SparkSession.sparkContext` to get the context if you need it, etc. If you're looking for Hive support (previously HiveContext), you do something like `val spark = SparkSession.builder().enableHiveSupport()` – Anthony May 22 '18 at 19:23
13

Apparently there is no way how to initialize SparkSession from existing SparkContext.

Stefan Repcek
  • 2,553
  • 4
  • 21
  • 29
7
public JavaSparkContext getSparkContext() 
{
        SparkConf conf = new SparkConf()
                    .setAppName("appName")
                    .setMaster("local[*]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        return jsc;
}


public  SparkSession getSparkSession()
{
        sparkSession= new SparkSession(getSparkContext().sc());
        return sparkSession;
}


you can also try using builder  

public SparkSession getSparkSession()
{
        SparkConf conf = new SparkConf()
                        .setAppName("appName")
                        .setMaster("local");

       SparkSession sparkSession = SparkSession
                                   .builder()
                                   .config(conf)
                                  .getOrCreate();
        return sparkSession;
}
  • 1
    in your second method you don't use any spark context, in scala I can't construct SparkSession like in your getSparkSession() – Stefan Repcek May 10 '17 at 20:30
5
val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()
lostsoul29
  • 746
  • 2
  • 11
  • 19