1

So this question is similar to How to pass a python variables to shell script in azure databricks notebookbles.? but it's slightly different.

I have a notebook that runs other notebook few times with different arguments and the issue is one of the arguments needs to be environmental variable used by shell (in this case I pass the variable that is the name of the directory where I clone git repo to). Everything works fine if I run notebook by notebook. However I was hoping to run as threads.And in that case env variables are overwritten (obviously). I was hoping than there is some smarter way to pass variables between python and shell to avoid overwriting it.

Caller notebook:

from multiprocessing.pool import ThreadPool
pool = ThreadPool(10)
pool.starmap(
  lambda schema_name,model_name,branch_name: dbutils.notebook.run(
    "testing_workflow",
    timeout_seconds = 360,
    arguments = {"schema_name":schema_name, 
                 "model_name":model_name, 
                 "branch_name":branch_name}),
  (["prod","sp","develop"], ["dev","sp","DE-1006"])
)

Callee notebook

import os
import json
os.environ["git_checkout"] = f"""git clone https://{dbutils.secrets.get(scope = "dev", key = "github_key")}@github.com/xxx/image.git --branch {dbutils.widgets.get("branch_name")} {dbutils.widgets.get("schema_name")}"""
os.environ["schema_name"] = "{0}".format(dbutils.widgets.get("schema_name"))
%sh ${git_checkout}
AlienDeg
  • 1,288
  • 1
  • 13
  • 23

0 Answers0