0

I have a scenario in which we are connecting apache spark with sql server load data of tables into spark and generate aparquet file from it.

Here is a snippet of my code:

val database = "testdb" 
val jdbcDF = (spark.read.format("jdbc")
.option("url",  "jdbc:sqlserver://DESKTOP-694SPLH:1433;integratedSecurity=true;databaseName="+database)
.option("dbtable", "employee")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") 
.load())
jdbcDF.write.parquet("/tmp/output/people.parquet")

It is working fine in spark shell, but I want to automate this in Windows PowerShell, or a Windows Command Script, (batch file), so that it becomes part of a SQL Server job.

I would appreciate any suggestions, or leads.

Compo
  • 36,585
  • 5
  • 27
  • 39
  • Does this answer your question? [Spark : how to run spark file from spark shell](https://stackoverflow.com/questions/27717379/spark-how-to-run-spark-file-from-spark-shell) – mazaneicha Dec 15 '21 at 22:54

1 Answers1

0

Have been able to do it myself i will list down the steps anyone can get help from it.

  1. put your code spark-shell code into into a scala file , program or scala app.
  2. build the spark scala app using SBT or Maven with Spark dependencies.
  3. once you are successfully able to compile and run your spark scala app.
  4. Package or Assemble you Scala app into a jar file , Assembly will make a fat jar file , i used Assembly.
  5. use spark-submit to call the jar file of your Spark App in a windows batch file this will automate your Spark code.