7

I want to know if it is possible to run a Databricks job from a notebook using code, and how to do it

I have a job with multiple tasks, and many contributors, and we have a job created to execute it all, now we want to run the job from a notebook to test new features without creating a new task in the job, also for running the job multiple times in a loop, for example:

for i in [1,2,3]:
    run job with parameter i

Regards

Joe
  • 561
  • 1
  • 9
  • 26
  • 1
    just a thought if in your case you could make your job parameterized and pass different parameters for it to run... Or other approach that I can think of is to use Jobs api to trigger the job multiple times based on the success of job... But this would cost you lot of delay between two runs as it would include cluster start time for each run.. – Nikunj Kakadiya Mar 11 '22 at 17:26

2 Answers2

3

what you need to do is the following:

  1. install the databricksapi. %pip install databricksapi==1.8.1

  2. Create your job and return an output. You can do that by exiting the notebooks like that:

    import json from databricksapi import Workspace, Jobs, DBFS dbutils.notebook.exit(json.dumps({"result": f"{_result}"}))

If you want to pass a dataframe, you have to pass them as json dump too, there is some official documentation about that from databricks. check it out.

  1. Get the job id you will need it later. You can get it from the jobs details in databricks.

  2. In the executors notebook you can use the following code.

     def run_ks_job_and_return_output(params):
       context = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
         # context
       url = context['extraContext']['api_url']
       token = context['extraContext']['api_token']
    
       jobs_instance = Jobs.Jobs(url, token) # initialize a jobs_instance
       runs_job_id = jobs_instance.runJob(****************, 'notebook',
                          params) # **** is the job id
    
       run_is_not_completed = True
       while run_is_not_completed:
         current_run = [run for run in jobs_instance.runsList('completed')['runs'] if run['run_id'] == runs_job_id['run_id'] and run['number_in_job'] == runs_job_id['number_in_job']]
         if len(current_run) == 0:
           time.sleep(30)
         else:
           run_is_not_completed = False
           current_run = current_run[0]
           print( f"Result state:   {current_run['state']['result_state']}, You can check the resulted output in the following link: {current_run['run_page_url']}")
           note_output = jobs_instance.runsGetOutput(runs_job_id['run_id'])['notebook_output']
           return note_output
    
     run_ks_job_and_return_output( { 'parm1' : 'george',
                                        'variable': "values1"})
    

If you want to run the job many times in parallel you can do the following. (first be sure that you have increased the max concurent runs in the job settings)

from multiprocessing.pool import ThreadPool 
pool = ThreadPool(1000) 
results = pool.map(lambda j: run_ks_job_and_return_output( { 'table' : 'george',
                                   'variable': "values1",
                                         'j': j}), 
         [str(x) for x in range(2,len(snapshots_list))])

There is also the possibility to save the whole html output but maybe you are not interested on that. In any case I will answer to that to another post on StackOverflow.

Hope it helps.

George Sotiropoulos
  • 1,864
  • 1
  • 22
  • 32
-2

You can use following steps :

Note-01:

dbutils.widgets.text("foo", "fooDefault", "fooEmptyLabel")
dbutils.widgets.text("foo2", "foo2Default", "foo2EmptyLabel")
result = dbutils.widgets.get("foo")+"-"+dbutils.widgets.get("foo2")
def display():
     print("Function Display: "+result)
dbutils.notebook.exit(result)

Note-02:

thislist = ["apple", "banana", "cherry"]
for x in thislist:
  dbutils.notebook.run("Note-01 path", 60, {"foo": x,"foo2":'Azure'})

enter image description here enter image description here

  • 3
    Hi, yes I know this is how to run other notebooks from a notebook, but what I want to do is run a job that I have already created in DataBricks and run it from a notebook – Joe Jan 07 '22 at 15:07
  • 2
    @Joe I am looking at the same problem. Did you find any solution for this? – Umang singhal Jan 28 '22 at 06:31
  • @Joe I am looking at eh same problem. Did you find aby solution for this? – Deepak Oct 10 '22 at 10:46