1

Issue: I made a wheel out of a very basic module, installed it on a Databricks cluster. When creating a job of type 'Python wheel', the job fails to run because it cannot find the package.


The setup is very simple. I have a source code folder:

src
|-app_1
  |- __init__.py
  |- main.py

Where main.py contains:

def func(): 
    print('Hello world!')

Then, I do the following:

  1. Build src as wheels demo-0.0.0-py3-none-any.whl.

  2. Install demo-0.0.0-py3-none-any.whl in the Databricks cluster. I then validate that the wheel was built and installed correctly. I know this because I am able to run from app_1.main import func, then calling func succeeds. This is the only wheel installed in the cluster.

  3. Create a job of type Python wheel, then set package name as app_1 and entrypoint as main.func. When I run the job, I get an error that app_1 cannot be found.

Job Config

Error Banner

Iga Quintana
  • 129
  • 3
  • 7

4 Answers4

1

You have to name your wheel - in setup - as the same name you will call in databricks task scheduler:

setup(
    name="app_1.main",
    ...
)

In Databricks task scheduler: Package name: app_1.main and Entry Point: func

Under the hood Databricks reads the metadata from package and consider the entire wheel as a package, doing import app_1.main.

If your package don't have the same name in setup metadata the scheduler doesn't works.

1

You can keep init.py empty. In setup.cfg, declare an entrypoint function

[options.entry_points]
console_scripts = 
    sample_entrypoint_name = app_1.main:func

In Databricks job UI simply mention the entrypoint name (sample_entrypooint_name). If you have argeparse declared inside your 'funct' function, mention them as arguments inside parameters section.

Mauli
  • 11
  • 1
0

I think that you need to set package to app1.main and entry point to func because documentation says that it will call $packageName.$entryPoint(), and in your case the full command is app1.main.func()

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Same result unfortunately: `Run result unavailable: job failed with error message Python wheel with name app_1.main could not be found`. Driver logs: `PackageNotFoundError: app_1.main` – Iga Quintana Mar 04 '22 at 18:46
  • Ah, you need to add a dependent library, by specifying the path to it on DBFS… you simply don’t have your code attached to a job – Alex Ott Mar 04 '22 at 18:59
  • Thank you for your comments so far. The specifying DBFS path is only present for a job type 'Python'. But for job type 'Python wheel', it asks for .whl uploaded as a library, I've also tried uploading it (in addition to having it already installed in the same cluster), but alas same error. – Iga Quintana Mar 04 '22 at 21:33
0

The way to get this to work is to update __init__.py with something like:

from my_package import main

Then in your databricks job, your entry point would be: main.main (assuming you have a function main in main.py