0

I am learning scala on docker, which doesn't have sbt or maven on it, I am facing this error and all of the internet solutions involve sbt or maven, was wondering if this can be handled without sbt or maven.
Wanted to create the jar using

scalac problem1.scala -d problem1.jar

Error:
problem1.scala:3: error: object apache is not a member of package org import org.apache.spark.SparkContext

Code:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.log4j.{Logger,Level}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.lit
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{StructType, StructField,  LongType, StringType}
//import org.apache.parquet.format.StringType

object problem1 {
  def main(args: Array[String]) {
    Logger.getLogger("org").setLevel(Level.OFF)
    //Create conf object
    val conf = new SparkConf().setMaster("local[2]").setAppName("loadData")
    //create spark context object
    val sc = new SparkContext(conf)

    val SQLContext = new SQLContext(sc)
    import SQLContext.implicits._

    //Read file and create RDD
    val table_schema = StructType(Seq(
      StructField("TransID", LongType, true),
      StructField("CustID", LongType, true),
      StructField("TransTotal", LongType, true),
      StructField("TransNumItems", LongType, true),
      StructField("TransDesc", StringType, true)
    ))
    val T = SQLContext.read
      .format("csv")
      .schema(table_schema)
      .option("header","false")
      .option("nullValue","NA")
      .option("delimiter",",")
      .load(args(0))
    //    T.show(5)

    val T1 = T.filter($"TransTotal" >= 200)
    //    T1.show(5)
    val T2 = T1.groupBy("TransNumItems").agg(sum("TransTotal"), avg("TransTotal"),
      min("TransTotal"), max("TransTotal"))
    //    T2.show(500)
    T2.show()
    val T3 =  T1.groupBy("CustID").agg(count("TransID").as("number_of_transactions_T3"))
    //    T3.show(50)
    val T4 = T.filter($"TransTotal" >= 600)
    //   T4.show(5)
    val T5 = T4.groupBy("CustID").agg(count("TransID").as("number_of_transactions_T5"))
    //    T5.show(50)
    val temp = T3.as("T3").join(T5.as("T5"), ($"T3.CustID" === $"T5.CustID") )
    //    T6.show(5)
    //    print(T6.count())
    val T6 = temp.where(($"number_of_transactions_T5")*5 < $"number_of_transactions_T3")
    //    T6.show(5)
    T6.show()
    sc.stop
  }
}
Koedlt
  • 4,286
  • 8
  • 15
  • 33
HelloWorld
  • 77
  • 3
  • 9

1 Answers1

1
  • Why not to choose a Docker image with sbt?

  • Anyway, yes, surely you can create a jar from command line using pure Scala without sbt. You should have dependency jars (spark-core, spark-catalyst, spark-sql, log4j, maybe some others if needed) and specify classpath manually

scalac -cp /path/to/spark-core_2.13-3.3.1.jar:/path/to/spark-catalyst_2.13/3.3.1/spark-catalyst_2.13-3.3.1.jar:/path/to/spark-sql_2.13/3.3.1/spark-sql_2.13-3.3.1.jar:/path/to/log4j-1.2-api-2.17.2.jar -d problem1.jar problem1.scala 

For example for me the path/to is the following:

scalac -cp /home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/spark/spark-core_2.13/3.3.1/spark-core_2.13-3.3.1.jar:/home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.13/3.3.1/spark-catalyst_2.13-3.3.1.jar:/home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.3.1/spark-sql_2.13-3.3.1.jar:/home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/logging/log4j/log4j-1.2-api/2.17.2/log4j-1.2-api-2.17.2.jar -d problem1.jar problem1.scala 
  • Alternatively, somewhere where you have sbt, you can create a fat jar (sbt assembly) with all dependencies (or even with your application and all dependencies) and use it
scalac -cp fat-jar.jar -d problem1.jar problem1.scala

https://github.com/sbt/sbt-assembly

  • One more option is to create sbt launcher for your application

https://www.scala-sbt.org/1.x/docs/Sbt-Launcher.html

SBT gives java.lang.NullPointerException when trying to run spark

Sbt launcher helps to run application in environments with only Java installed.

  • One more option is to manage dependencies with Coursier programmatically

Can you import a separate version of the same dependency into one build file for test?

How to compile and execute scala code at run-time in Scala3?

How can I run generated code during script runtime?

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
  • https://gist.github.com/DmytroMitin/62f92bb15550d689c50276d90103c16d https://gist.github.com/DmytroMitin/ff0b07fc93f1b3674723db1247a1467f https://stackoverflow.com/questions/76028146/running-spark-on-sbt-console – Dmytro Mitin Apr 24 '23 at 01:53