Can we write Scala/Spark generic dynamically written code

Question

I am trying to create Spark Scala code which can read any file with different number of columns. Can i dynamically write scala/spark code and compile and execute it. do i really need SBT. Whats the perfect way to achive this goal.

when i run scala code using shell script or scalac code.scala it says

hadoop@namenode1:/usr/local/scala/examples$ ./survey.sh 
/usr/local/scala/examples/./survey.sh:6: error: not found: value spark
val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
             ^
/usr/local/scala/examples/./survey.sh:19: error: not found: type paste
:paste
 ^
/usr/local/scala/examples/./survey.sh:37: error: not found: value udf
val parseGenderUDF = udf( parseGender _ )
                     ^
three errors found

I want something like

dynamically generate file.scala code using shell script then complie it using

scalac file.scala

then execute it

scala file.scala

But is this possible. what is the way to do it.

hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ cat Survey.scala 
import org.apache.spark.sql.{SparkSession}

object Survey {
   def main(args: Array[String]) {
val spark= SparkSession.builder
  .master("local")
  .appName("Survey")
  .getOrCreate()

val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")
survey.show()
}
}

error when executed

hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$ scalac Survey.scala
    Survey.scala:1: error: object apache is not a member of package org
    import org.apache.spark.sql.{SparkSession}
               ^
    Survey.scala:5: error: not found: value SparkSession
    val spark= SparkSession.builder
               ^
    two errors found
    hadoop@namenode1:/usr/local/spark/examples/src/main/scala/org/apache/spark/examples$

[Spark : how to run spark file from spark shell](https://stackoverflow.com/q/27717379) — zero323, Oct 17 '18 at 16:33

score 1 · Answer 1 · answered Oct 17 '18 at 16:13

1

To submit spark jobs, either you have to use spark-submit command or execute scala scripts in spark-shell. Apache Livy provides a REST API to submit spark jobs as well.

answered Oct 17 '18 at 16:13

Apurba Pandey

1,061
10
21

score 0 · Answer 2 · answered Oct 17 '18 at 16:45

you need create sparkSession exemple :

import org.apache.spark.sql.{SparkSession}
val spark= SparkSession.builder
  .master("local")
  .appName("MYAPP")
  .getOrCreate()

val survey = spark.read.format("com.databricks.spark.csv").option("header","true").option("nullValue","NA").option("timestampFormat","yyyy-MM-dd'T'HH:mm:ss").option("mode","failfast").option("inferchema","true").load("/tmp/survey.csv")

// for udf you need

import org.apache.spark.sql.functions._
val parseGenderUDF = udf( parseGender _ )

i hop this help you

i tried what u said but this failed and i have updated it in question now — Ashish Mishra, Oct 17 '18 at 17:25

Ashish Mishra · Accepted Answer · 2018-12-10T09:26:35.817

0

I have found an alternative (by cricket-007)

spark-shell -i survey.scala

But this takes time in configuring spark-shell it seems.

and this is not what I want

edited Dec 10 '18 at 09:26

answered Oct 27 '18 at 05:34

Ashish Mishra

510
4
18

Can we write Scala/Spark generic dynamically written code

3 Answers3