Several advices how to debug Spark and sbt.
How to build Spark in IntelliJ.
Clone https://github.com/apache/spark , open it in IntelliJ as sbt project.
I had to execute sbt compile
and re-open the project before I can run my code in IntelliJ (I had an error object SqlBaseParser is not a member of package org.apache.spark.sql.catalyst.parser
before that). For example I can put the following object in sql/core/src/main/scala
and run/debug it in IntelliJ
// scalastyle:off
import org.apache.spark.sql.{Dataset, SparkSession}
object MyMain extends App {
val spark = SparkSession.builder()
.master("local")
.appName("SparkTestApp")
.getOrCreate()
case class Person(id: Long, name: String)
import spark.implicits._
val df: Dataset[Person] = spark.range(10).map(i => Person(i, i.toString))
df.show()
//+---+----+
//| id|name|
//+---+----+
//| 0| 0|
//| 1| 1|
//| 2| 2|
//| 3| 3|
//| 4| 4|
//| 5| 5|
//| 6| 6|
//| 7| 7|
//| 8| 8|
//| 9| 9|
//+---+----+
}
I also pressed Run npm install
, Load Maven project
when these pop-up windows appeared but I haven't noticed the difference.
Also once I had to keep in Project Structure
in sql/catalyst/target/scala-2.12/src_managed
only one source root sql/catalyst/target/scala-2.12/src_managed/main
(and not sql/catalyst/target/scala-2.12/src_managed/main/antlr4
). I had errors like SqlBaseLexer is already defined as class SqlBaseLexer
before that.
Build Apache Spark Source Code with IntelliJ IDEA: https://yujheli-wordpress-com.translate.goog/2020/03/26/build-apache-spark-source-code-with-intellij-idea/?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=uk&_x_tr_pto=wapp (original in Chinese: https://yujheli.wordpress.com/2020/03/26/build-apache-spark-source-code-with-intellij-idea/ )
Why does building Spark sources give "object sbt is not a member of package com.typesafe"?
How to build sbt in IntelliJ.
sbt itself is tricky https://www.lihaoyi.com/post/SowhatswrongwithSBT.html and building it is a little tricky too.
Clone https://github.com/sbt/sbt , open it in IntelliJ. Let's try to run the previous Spark code using this cloned sbt.
sbt seems to be not intended to run in a specified directory. I put the following object in client/src/main/scala
object MyClient extends App {
System.setProperty("user.dir", "../spark")
sbt.client.Client.main(Array("sql/runMain MyMain"))
}
(Generally, mutating the system property user.dir
is not recommended: How to use "cd" command using Java runtime?)
I had to execute sbt compile
firstly (this includes the command sbt generateContrabands
--- sbt uses sbt plugin sbt-contraband
(ContrabandPlugin
, JsonCodecPlugin
), formerly sbt-datatype
, for code generation: https://github.com/sbt/contraband https://www.scala-sbt.org/contraband/ https://www.scala-sbt.org/1.x/docs/Datatype.html https://github.com/eed3si9n/gigahorse/tree/develop/core/src/main/contraband). I had error not found: value ScalaKeywords
before that.
Next error is type ExcludeItem is not a member of package sbt.internal.bsp
. You can just remove in protocol/src/main/contraband-scala/sbt/internal/bsp/codec
the files ExcludeItemFormats.scala
, ExcludesItemFormats.scala
, ExcludesParamsFormats.scala
, ExcludesResultFormats.scala
. They are outdated auto-generated files. You can check that if you remove the content of directory protocol/src/main/contraband-scala
(this is a root for auto-generated sources) and execute sbt generateContrabands
all the files except these four will be restored. For some reason these files didn't confuse sbt but confuse IntelliJ.
Now, while running, MyClient
produces
//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] | 0| 0|
//[info] | 1| 1|
//[info] | 2| 2|
//[info] | 3| 3|
//[info] | 4| 4|
//[info] | 5| 5|
//[info] | 6| 6|
//[info] | 7| 7|
//[info] | 8| 8|
//[info] | 9| 9|
//[info] +---+----+
sbt.client.Client
is called the thin client. Alternatively, you can publish it locally and use as a dependency
build.sbt (https://github.com/sbt/sbt/blob/v1.8.0/build.sbt#L1160)
lazy val sbtClientProj = (project in file("client"))
.enablePlugins(NativeImagePlugin)
.dependsOn(commandProj)
.settings(
commonBaseSettings,
scalaVersion := "2.12.11",
publish / skip := false, // change true to false
name := "sbt-client",
.......
sbt publishLocal
A new project:
build.sbt
scalaVersion := "2.12.17"
// ~/.ivy2/local/org.scala-sbt/sbt-client/1.8.1-SNAPSHOT/jars/sbt-client.jar
libraryDependencies += "org.scala-sbt" % "sbt-client" % "1.8.1-SNAPSHOT"
src/main/scala/Main.scala
object Main extends App {
System.setProperty("user.dir", "../spark")
sbt.client.Client.main(Array("sql/runMain MyMain"))
//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] | 0| 0|
//[info] | 1| 1|
//[info] | 2| 2|
//[info] | 3| 3|
//[info] | 4| 4|
//[info] | 5| 5|
//[info] | 6| 6|
//[info] | 7| 7|
//[info] | 8| 8|
//[info] | 9| 9|
//[info] +---+----+
}
But the thin client is not how sbt normally runs. sbt.xMain
from your stack trace is from https://github.com/sbt/sbt . It's here: https://github.com/sbt/sbt/blob/1.8.x/main/src/main/scala/sbt/Main.scala#L44 But xsbt.boot.Boot
from the stack trace is not from this repo, it's from https://github.com/sbt/launcher , namely https://github.com/sbt/launcher/blob/1.x/launcher-implementation/src/main/scala/xsbt/boot/Boot.scala
The thing is that sbt runs in two steps. The sbt executable (usually downloaded from https://www.scala-sbt.org/download.html#universal-packages) is a shell script, firstly it runs sbt-launch.jar
(the object xsbt.boot.Boot
)
https://github.com/sbt/sbt/blob/v1.8.0/sbt#L507-L512
execRunner "$java_cmd" \
"${java_args[@]}" \
"${sbt_options[@]}" \
-jar "$sbt_jar" \
"${sbt_commands[@]}" \
"${residual_args[@]}"
and secondly the latter reflectively calls sbt (the class sbt.xMain
)
https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L147-L149
val main = appProvider.newMain()
try {
withContextLoader(appProvider.loader)(main.run(appConfig))
https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L496
// implementation of the above appProvider.newMain()
else if (AppMainClass.isAssignableFrom(entryPoint)) mainClass.newInstance
https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/PlainApplication.scala#L13
// implementation of the above main.run(appConfig)
mainMethod.invoke(null, configuration.arguments).asInstanceOf[xsbti.Exit]
Then xMain#run
via XMainConfiguration#run
reflectively calls xMain.run
https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/scala/sbt/Main.scala#L44-L47
class xMain extends xsbti.AppMain {
def run(configuration: xsbti.AppConfiguration): xsbti.MainResult =
new XMainConfiguration().run("xMain", configuration)
}
https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/java/sbt/internal/XMainConfiguration.java#L51-L57
Class<?> clazz = loader.loadClass("sbt." + moduleName + "$");
Object instance = clazz.getField("MODULE$").get(null);
Method runMethod = clazz.getMethod("run", xsbti.AppConfiguration.class);
try {
.....
return (xsbti.MainResult) runMethod.invoke(instance, updatedConfiguration);
Then it downloads and runs necessary version of Scala (specified in a build.sbt
) and necessary version of the rest of sbt (specified in a project/build.properties
).
What is the launcher.
Let's consider a helloworld for the launcher.
The launcher consists of a library (interfaces)
https://mvnrepository.com/artifact/org.scala-sbt/launcher-interface
https://github.com/sbt/launcher/tree/1.x/launcher-interface
and the launcher runnable jar
https://mvnrepository.com/artifact/org.scala-sbt/launcher
https://github.com/sbt/launcher/tree/1.x/launcher-implementation/src
Create a project (depending on launcher interfaces at compile tome)
build.sbt
lazy val root = (project in file("."))
.settings(
name := "scalademo",
organization := "com.example",
version := "0.1.0-SNAPSHOT",
scalaVersion := "2.13.10",
libraryDependencies ++= Seq(
"org.scala-sbt" % "launcher-interface" % "1.4.1" % Provided,
),
)
src/main/scala/mypackage/Main.scala (this class will be an entry point while working with the launcher)
package mypackage
import xsbti.{AppConfiguration, AppMain, Exit, MainResult}
class Main extends AppMain {
def run(configuration: AppConfiguration): MainResult = {
val scalaVersion = configuration.provider.scalaProvider.version
println(s"Hello, World! Running Scala $scalaVersion")
configuration.arguments.foreach(println)
new Exit {
override val code: Int = 0
}
}
}
Do sbt publishLocal
. The project jar will be published at ~/.ivy2/local/com.example/scalademo_2.13/0.1.0-SNAPSHOT/jars/scalademo_2.13.jar
Download launcher runnable jar https://repo1.maven.org/maven2/org/scala-sbt/launcher/1.4.1/launcher-1.4.1.jar
Create launcher configuration
my.app.configuration
[scala]
version: 2.13.10
[app]
org: com.example
name: scalademo
version: 0.1.0-SNAPSHOT
class: mypackage.Main
cross-versioned: binary
[repositories]
local
maven-central
[boot]
directory: ${user.home}/.myapp/boot
Then command java -jar launcher-1.4.1.jar @my.app.configuration a b c
produces
//Hello world! Running Scala 2.13.10
//a
//b
//c
There appeared files
~/.myapp/boot/scala-2.13.10/com.example/scalademo/0.1.0-SNAPSHOT
scalademo_2.13.jar
scala-library-2.13.10.jar
~/.myapp/boot/scala-2.13.10/lib
java-diff-utils-4.12.jar
jna-5.9.0.jar
jline-3.21.0.jar
scala-library.jar
scala-compiler.jar
scala-reflect.jar
So launcher helps to run application in environments with only Java installed (Scala is not necessary), Ivy dependency resolution will be used. There are features to handle return codes, reboot application with a different Scala version, launch servers etc.
Alternatively, any of the following commands can be used
java -Dsbt.boot.properties=my.app.configuration -jar launcher-1.4.1.jar
java -jar launcher-repacked.jar # put my.app.configuration to sbt/sbt.boot.properties/ and repack the jar
https://www.scala-sbt.org/1.x/docs/Launcher-Getting-Started.html
How to run sbt with the launcher.
Sbt https://github.com/sbt/sbt uses sbt plugin SbtLauncherPlugin
https://github.com/sbt/sbt/blob/v1.8.0/project/SbtLauncherPlugin.scala so that from the raw launcher launcher
https://github.com/sbt/launcher/tree/1.x/launcher-implementation/src
https://mvnrepository.com/artifact/org.scala-sbt/launcher
it builds sbt-launch
https://github.com/sbt/sbt/tree/v1.8.0/launch
https://mvnrepository.com/artifact/org.scala-sbt/sbt-launch
Basically, sbt-launch
is different from launcher
in having default config sbt.boot.properties
injected.
If we'd like to run sbt with the launcher then we should find a way to specify a working directory for sbt (similarly to how we did this while working with thin client).
Working directory can be set either 1) in sbt.xMain
(sbt
) or 2) in xsbt.boot.Boot
(sbt-launcher
).
1)
Make sbt.xMain
non-final so that it can be extended
/*final*/ class xMain extends xsbti.AppMain {
...........
https://github.com/sbt/sbt/blob/v1.8.0/main/src/main/scala/sbt/Main.scala#L44
Put a new class to main/src/main/scala
(a launcher-style entry point)
import sbt.xMain
import xsbti.{ AppConfiguration, AppProvider, MainResult }
import java.io.File
class MyXMain extends xMain {
override def run(configuration: AppConfiguration): MainResult = {
val args = configuration.arguments
val (dir, rest) =
if (args.length >= 1 && args(0).startsWith("dir=")) {
(
Some(args(0).stripPrefix("dir=")),
args.drop(1)
)
} else {
(None, args)
}
dir.foreach { dir =>
System.setProperty("user.dir", dir)
}
// xMain.run(new AppConfiguration { // not ok
// new xMain().run(new AppConfiguration { // not ok
super[xMain].run(new AppConfiguration { // ok
override val arguments: Array[String] = rest
override val baseDirectory: File =
dir.map(new File(_)).getOrElse(configuration.baseDirectory)
override val provider: AppProvider = configuration.provider
})
}
}
sbt publishLocal
my.sbt.configuration
[scala]
version: auto
#version: 2.12.17
[app]
org: org.scala-sbt
name: sbt
#name: main # not ok
version: 1.8.1-SNAPSHOT
class: MyXMain
#class: sbt.xMain
components: xsbti,extra
cross-versioned: false
#cross-versioned: binary
[repositories]
local
maven-central
[boot]
directory: ${user.home}/.mysbt/boot
[ivy]
ivy-home: ${user.home}/.ivy2
A command:
java -jar launcher-1.4.1.jar @my.sbt.configuration dir=/path_to_spark/spark "sql/runMain MyMain"
or
java -jar sbt-launch.jar @my.sbt.configuration dir=/path_to_spark/spark "sql/runMain MyMain"
//[info] +---+----+
//[info] | id|name|
//[info] +---+----+
//[info] | 0| 0|
//[info] | 1| 1|
//[info] | 2| 2|
//[info] | 3| 3|
//[info] | 4| 4|
//[info] | 5| 5|
//[info] | 6| 6|
//[info] | 7| 7|
//[info] | 8| 8|
//[info] | 9| 9|
//[info] +---+----+
(sbt-launch.jar
is taken from ~/.ivy2/local/org.scala-sbt/sbt-launch/1.8.1-SNAPSHOT/jars
or just https://mvnrepository.com/artifact/org.scala-sbt/sbt-launch since we haven't modified launcher yet)
I had to copy scalastyle-config.xml
from spark
, otherwise it wasn't found.
Still I have warnings fatal: Not a git repository (or any parent up to mount parent ...) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2)
project/Dependencies.scala (https://github.com/sbt/sbt/blob/v1.8.0/project/Dependencies.scala#L25)
val launcherVersion = "1.4.2-SNAPSHOT" // modified
Clone https://github.com/sbt/launcher and make the following changes
build.sbt (https://github.com/sbt/launcher/blob/v1.4.1/build.sbt#L11)
ThisBuild / version := {
val orig = (ThisBuild / version).value
if (orig.endsWith("-SNAPSHOT")) "1.4.2-SNAPSHOT" // modified
else orig
}
launcher-implementation/src/main/scala/xsbt/boot/Launch.scala
(https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Launch.scala#L17
#L21)
class LauncherArguments(
val args: List[String],
val isLocate: Boolean,
val isExportRt: Boolean,
val dir: Option[String] = None // added
)
object Launch {
def apply(arguments: LauncherArguments): Option[Int] =
apply((new File(arguments.dir.getOrElse(""))).getAbsoluteFile, arguments) // modified
.............
launcher-implementation/src/main/scala/xsbt/boot/Boot.scala (https://github.com/sbt/launcher/blob/v1.4.1/launcher-implementation/src/main/scala/xsbt/boot/Boot.scala#L41-L67)
def parseArgs(args: Array[String]): LauncherArguments = {
@annotation.tailrec
def parse(
args: List[String],
isLocate: Boolean,
isExportRt: Boolean,
remaining: List[String],
dir: Option[String] // added
): LauncherArguments =
args match {
...................
case "--locate" :: rest => parse(rest, true, isExportRt, remaining, dir) // modified
case "--export-rt" :: rest => parse(rest, isLocate, true, remaining, dir) // modified
// added
case "--mydir" :: next :: rest => parse(rest, isLocate, isExportRt, remaining, Some(next))
case next :: rest => parse(rest, isLocate, isExportRt, next :: remaining, dir) // modified
case Nil => new LauncherArguments(remaining.reverse, isLocate, isExportRt, dir) // modified
}
parse(args.toList, false, false, Nil, None)
}
sbt-launcher: sbt publishLocal
sbt: sbt publishLocal
my.sbt.configuration
[scala]
version: auto
[app]
org: org.scala-sbt
name: sbt
version: 1.8.1-SNAPSHOT
#class: MyXMain
class: sbt.xMain
components: xsbti,extra
cross-versioned: false
[repositories]
local
maven-central
[boot]
directory: ${user.home}/.mysbt/boot
[ivy]
ivy-home: ${user.home}/.ivy2
A command:
java -jar launcher-1.4.2-SNAPSHOT.jar @my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"
or
java -jar sbt-launch.jar @my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"
or
java -jar sbt-launch.jar --mydir /path_to_spark/spark "sql/runMain MyMain"
(using default sbt.boot.properties
rather than my.sbt.configuration
)
(we're using modified launcher
or new sbt-launch
using this modified launcher
).
Alternatively, we can specify "program arguments" in "Run configuration" for xsbt.boot.Boot
in IntelliJ
@/path_to_sbt_config/my.sbt.configuration --mydir /path_to_spark/spark "sql/runMain MyMain"
Also it's possible to specify working directory /path_to_spark/spark
in "Run configuration" in IntelliJ. Then remaining "program arguments" are
@/path_to_sbt_config/my.sbt.configuration "sql/runMain MyMain"
I tried to use "org.scala-sbt" % "launcher" % "1.4.2-SNAPSHOT"
or "org.scala-sbt" % "sbt-launch" % "1.8.1-SNAPSHOT"
as a dependency but got No RuntimeVisibleAnnotations in classfile with ScalaSignature attribute: class Boot
.
Your setting.
So we can run/debug sbt-launcher code in IntelliJ and/or with println
s and run/debug sbt code with println
s (because there is no runnable object).
From your stack trace I have suspection that one of classloader urls
is null
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/misc/URLClassPath.java#L82
Maybe you can add to sbt.xMain#run
or MyXMain#run
something like
var cl = getClass.getClassLoader
while (cl != null) {
println(s"classloader: ${cl.getClass.getName}")
cl match {
case cl: URLClassLoader =>
println("classloader urls:")
cl.getURLs.foreach(println)
case _ =>
println("not URLClassLoader")
}
cl = cl.getParent
}
in order to see what url is null.
https://www.scala-sbt.org/1.x/docs/Developers-Guide.html
https://github.com/sbt/sbt/blob/1.8.x/DEVELOPING.md