I have a project for which I want to be able to run some entry points on databricks. I used dbx for that, having the following deployment.yaml
file:
build:
python: "poetry"
environments:
default:
workflows:
- name: "test"
existing_cluster_id: "my-culster-id"
spark_python_task:
python_file: "file://tests/test.py"
I'm able to run the test script with the execute
command:
poetry run dbx execute --cluster-id=my-culster-id test
My problem with this option is that it launches the script interactively and I can't really retrieve the executed code on Databricks, except by looking at the cluster's logs.
So I tried using the deploy
and launch
commands, such that a proper job is created and run on Databricks.
poetry run dbx deploy test && poetry run dbx launch test
However the job run fails with the following error, which I don't understand:
Run result unavailable: job failed with error message
Library installation failed for library due to user error. Error messages:
'Manage' permissions are required to modify libraries on a cluster
In any case, what do you think is the best way to run a job that can be traced on Databricks from my local machine ?