-4

I am trying to create Spark Scala code which can read any file with different number of columns. Can i dynamically write scala/spark code and compile and execute it. do i really need SBT. Whats the perfect way to achive this goal.

when i run scala code using shell script or scalac code.scala it says

hadoop@namenode1:/usr/local/scala/examples$ ./survey.sh 
/usr/local/scala/examples/./survey.sh:6: error: not found: value spark
val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
             ^
/usr/local/scala/examples/./survey.sh:19: error: not found: type paste
:paste
 ^
/usr/local/scala/examples/./survey.sh:37: error: not found: value udf
val parseGenderUDF = udf( parseGender _ )
                     ^
three errors found

I want something like

dynamically generate file.scala code using shell script then complie it using

scalac file.scala

then execute it

scala file.scala

But is this possible. what is the way to do it.

hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ cat Survey.scala 
import org.apache.spark.sql.{SparkSession}

object Survey {
   def main(args: Array[String]) {
val spark= SparkSession.builder
  .master("local")
  .appName("Survey")
  .getOrCreate()

val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
survey.show()
}
}

error when executed

hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ scalac Survey.scala
    Survey.scala:1: error: object apache is not a member of package org
    import org.apache.spark.sql.{SparkSession}
               ^
    Survey.scala:5: error: not found: value SparkSession
    val spark= SparkSession.builder
               ^
    two errors found
    hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ 
Ashish Mishra
  • 510
  • 4
  • 18

3 Answers3

1

To submit spark jobs, either you have to use spark-submit command or execute scala scripts in spark-shell. Apache Livy provides a REST API to submit spark jobs as well.

Apurba Pandey
  • 1,061
  • 10
  • 21
0

you need create sparkSession exemple :

import org.apache.spark.sql.{SparkSession}
val spark= SparkSession.builder
  .master("local")
  .appName("MYAPP")
  .getOrCreate()

val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")

// for udf you need

import org.apache.spark.sql.functions._
val parseGenderUDF = udf( parseGender _ )

i hop this help you

BBAH
  • 46
  • 1
  • 6
0

I have found an alternative (by cricket-007)

spark-shell -i survey.scala

But this takes time in configuring spark-shell it seems.

and this is not what I want

Ashish Mishra
  • 510
  • 4
  • 18