-1

I have a Spark job which reads a source table, does a number of map / flatten / DATAFRAME operations and then stores the results into a separate table FROM TEMP TABLE we use for reporting. Currently this job is run manually using the spark-submit script. I want to schedule it to run every night.

is there any way to schedule spark job for batch processing similarly like an nightly batch ETL
Aryan
  • 57
  • 5

2 Answers2

0

There is no built-in mechanism in Spark that will help. A cron job seems reasonable. Other options are

  1. https://azkaban.github.io/
  2. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html
dassum
  • 4,727
  • 2
  • 25
  • 38
0

If you know python you can check Airflow. Airflow allows one to regularly schedule a task like cron, but is additionally more flexible in allowing for certain tasks to depend on each other and makes it easy to define complex relations even in a large distributed environment. You can check below link

How to run Spark code in Airflow?

Nayan Sharma
  • 1,823
  • 18
  • 19