So this question is similar to How to pass a python variables to shell script in azure databricks notebookbles.? but it's slightly different.
I have a notebook that runs other notebook few times with different arguments and the issue is one of the arguments needs to be environmental variable used by shell (in this case I pass the variable that is the name of the directory where I clone git repo to). Everything works fine if I run notebook by notebook. However I was hoping to run as threads.And in that case env variables are overwritten (obviously). I was hoping than there is some smarter way to pass variables between python and shell to avoid overwriting it.
Caller notebook:
from multiprocessing.pool import ThreadPool
pool = ThreadPool(10)
pool.starmap(
lambda schema_name,model_name,branch_name: dbutils.notebook.run(
"testing_workflow",
timeout_seconds = 360,
arguments = {"schema_name":schema_name,
"model_name":model_name,
"branch_name":branch_name}),
(["prod","sp","develop"], ["dev","sp","DE-1006"])
)
Callee notebook
import os
import json
os.environ["git_checkout"] = f"""git clone https://{dbutils.secrets.get(scope = "dev", key = "github_key")}@github.com/xxx/image.git --branch {dbutils.widgets.get("branch_name")} {dbutils.widgets.get("schema_name")}"""
os.environ["schema_name"] = "{0}".format(dbutils.widgets.get("schema_name"))
%sh ${git_checkout}