I am trying to create Spark Scala code which can read any file with different number of columns. Can i dynamically write scala/spark code and compile and execute it. do i really need SBT. Whats the perfect way to achive this goal.
when i run scala code using shell script or scalac code.scala it says
hadoop@namenode1:/usr/local/scala/examples$ ./survey.sh
/usr/local/scala/examples/./survey.sh:6: error: not found: value spark
val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
^
/usr/local/scala/examples/./survey.sh:19: error: not found: type paste
:paste
^
/usr/local/scala/examples/./survey.sh:37: error: not found: value udf
val parseGenderUDF = udf( parseGender _ )
^
three errors found
I want something like
dynamically generate file.scala code using shell script then complie it using
scalac file.scala
then execute it
scala file.scala
But is this possible. what is the way to do it.
hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ cat Survey.scala
import org.apache.spark.sql.{SparkSession}
object Survey {
def main(args: Array[String]) {
val spark= SparkSession.builder
.master("local")
.appName("Survey")
.getOrCreate()
val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
survey.show()
}
}
error when executed
hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ scalac Survey.scala
Survey.scala:1: error: object apache is not a member of package org
import org.apache.spark.sql.{SparkSession}
^
Survey.scala:5: error: not found: value SparkSession
val spark= SparkSession.builder
^
two errors found
hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$