10

I have the following worksheet in IntelliJ:

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

/** Lazily instantiated singleton instance of SQLContext */
object SQLContextSingleton {
  @transient  private var instance: SQLContext = _
  def getInstance(sparkContext: SparkContext): SQLContext = {
    if (instance == null) {
      instance = new SQLContext(sparkContext)
    }
    instance
  }
}

val conf = new SparkConf().
  setAppName("Scala Wooksheet").
  setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.json("/Users/someuser/some.json")
df.show

This code works in the REPL, but seems to run only the first time (with some other errors). Each subsequent time, the error is:

16/04/13 11:04:57 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor).  This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)

How can I find the context already in use?

Note: I hear others say to use conf.set("spark.driver.allowMultipleContexts","true") but this seems to be a solution of increasing memory usage (like uncollected garbage).

Is there a better way?

codeaperature
  • 1,089
  • 2
  • 10
  • 25
  • I think if you add `sc.close()` as the last line in the worksheet, you'll be OK - each execution would create a SparkContext and close it, so there won't be more than one running. – Tzach Zohar Apr 13 '16 at 18:22
  • @TzachZohar -- It seems that sc does not have a close method. – codeaperature Apr 13 '16 at 18:29
  • Oops, meant `stop()`, sorry – Tzach Zohar Apr 13 '16 at 18:30
  • @TzachZohar - Thanks ... I still need to ensure I don't crash before getting to that point. Probably with try /catch / finally. There must be a more common or elegant solution. (???) – codeaperature Apr 14 '16 at 21:39
  • Another thought ... maybe the question is not about closing the SparkContext but "How is it possible to find the SparkContext that is already open?" – codeaperature Apr 26 '16 at 20:51
  • In that case - AFAIK there's no way to do that. – Tzach Zohar Apr 27 '16 at 06:06
  • Others are asking the same question. See http://stackoverflow.com/questions/41673393/using-apache-spark-in-intellij-scala-worksheet and http://stackoverflow.com/questions/32189206/how-to-setup-intellij-14-scala-worksheet-to-run-spark -- Still searching ... – codeaperature Feb 16 '17 at 06:00

2 Answers2

11

I was having the same problem trying to get code executed with Spark in Scala Worksheet in IntelliJ IDEA (CE 2016.3.4).

The solution for the duplicate Spark context creation was to uncheck 'Run worksheet in the compiler process' checkbox in Settings -> Languages and Frameworks -> Scala -> Worksheet. I have also tested the other Worksheet settings and they had no effect on the problem of duplicate Spark context creation.

I also did not put sc.stop() in the Worksheet. But I had to set master and appName parameters in the conf for it to work.

Here is the Worksheet version of the code from SimpleApp.scala from Spark Quick Start

import org.apache.spark.{SparkConf, SparkContext}

val conf = new SparkConf()
conf.setMaster("local[*]")
conf.setAppName("Simple Application")

val sc = new SparkContext(conf)

val logFile = "/opt/spark-latest/README.md"
val logData = sc.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()

println(s"Lines with a: $numAs, Lines with b: $numBs")

I have used the same simple.sbt from the guide for importing the dependencies to IntelliJ IDEA.

Here is a screenshot of the functioning Scala Worksheet with Spark: a screenshot of the functioning Scala Worksheet with Spark

UPDATE for IntelliJ CE 2017.1 (Worksheet in REPL mode)

In 2017.1 Intellij introduced REPL mode for Worksheet. I have tested the same code with 'Use REPL' option checked. For this mode to run you need to leave the 'Run worksheet in the compiler process' checkbox in Worksheet Settings I have described above checked (it is by default).

The code runs fine in Worksheet REPL mode.

Here is the Screenshot: Apache Spark running in IntelliJ Scala Worksheet REPL Mode

tomaskazemekas
  • 5,038
  • 5
  • 30
  • 32
  • 3
    The solution for the duplicate Spark context creation was to uncheck 'Run worksheet in the compiler process' checkbox in Settings -> Languages and Frameworks -> Scala -> Worksheet. This worked for me. Another way is to create a simple Scala object and put Spark code in there. – jrook Mar 11 '17 at 21:41
  • Thank you @jrook you saved me a lot of time (y) – elarib Mar 22 '17 at 20:12
1

As detectivebag stated in this git post you can fix this problem by switching your worksheets to run in only 'eclipse compatibility mode':

1) open preferences

2) under Languages and Frameworks select scala

3) under the worksheet tab uncheck everything except 'Use "eclipse compatibility" mode'

Logister
  • 1,852
  • 23
  • 26