3

I am relatively new to GCP and am trying to schedule a notebook on GCP to run everyday. This notebook has dependencies in terms of libraries and other python modules/scripts. When I schedule this with the Cloud Scheduler (as shown in image), there are errors shown in logs at import statements of libraries and while importing other python modules.

schedule_on_gcp

I also created a requirements.txt file, but the scheduler doesn't seem to be reading it.
Am I doing something wrong?

Can anyone help or guide me with some possible solutions? Been stuck with this since a few days, any help would be highly appreciated.

PS- Cloud Functions would be by last option incase I'm not able to run this way.

2 Answers2

2

The problem is that we have 2 different environments:

  1. Notebook document itself
  2. Docker container that Notebook Executor uses when you click on Execute: a Docker container is passed to Executor backend (Notebooks API + Vertex Custom Job) and since you are installing the dependencies in the Notebook itself (Managed Notebook underlying infra), these are not included in the container, hence this fails. You need to pass a container that includes Selenium.

If you need to build a custom container I would do the following:

  1. Create a custom container
# Dockerfile.example
FROM gcr.io/deeplearning-platform-release/tf-gpu:latest
RUN pip install -y selenium

Then you’ll need to build and push it somewhere accessible.

PROJECT="my-gcp-project"
docker build . -f Dockerfile.example -t "gcr.io/${PROJECT}/tf-custom:latest"
gcloud auth configure-docker
docker push "gcr.io/${PROJECT}/tf-custom:latest"
  1. Specify the container when launching the Execution "Custom Container"

enter image description here

gogasca
  • 9,283
  • 6
  • 80
  • 125
  • This is the case, I'm pretty sure. But I dont know how I should be passing the container. Could you please guide me with the same? with an example or steps? Would be highly appreciated! Thanks. – Nikita Kini Mar 27 '22 at 10:56
  • Let me know if you could help me out here? Bit of critical issue currently. Would appreciate your guidance! :) – Nikita Kini Mar 28 '22 at 10:38
  • SO is not for critical issues, there is no SLOs, I would suggest you to open a case if urgent. – gogasca Mar 29 '22 at 03:08
  • Sorry, I just meant I'm too curious as to how this can be fixed. – Nikita Kini Mar 29 '22 at 07:56
  • Thanks for this gogasca. I'm trying to schedule it using Vertex AI- User Managed notebook, where there's no option of a custom container. – Nikita Kini Mar 29 '22 at 13:55
  • I would suggest to use Managed Notebooks + Executor, this way you can get a supported solution. You probably using an older Notebook image. – gogasca Mar 30 '22 at 03:28
  • Is there a way to schedule a user-managed notebook? – pradeepvaranasi May 04 '23 at 16:56
  • We have a new Private Preview called Workbench Instances, where you can do this. You may need to contact your AM to get whitelisted – gogasca May 05 '23 at 04:45
0

The error means that you are missing the selenium module, you need to install it. You can use the following commands to install it:

python -m pip install -U selenium (you need pip installed)

pip install selenium

or depending on your permissions:

sudo pip install selenium

For python3:

sudo pip3 install selenium

Edit 1:

If you have selenium installed, check where you have Python located and where the Python looks for libraries/packages, including the ones installed using pip. Sometimes Python runs from a location, but looks for libraries in a different location. Make sure Python is looking for the libraries in the right directory.

Here is an answer that you can use to check if Python is configured correctly.

  • Selenium is installed properly in JupyterLab. The code runs smoothly when run manually. But doesn't when scheduled. Its something to do with the schedule environment, is my guess. – Nikita Kini Mar 24 '22 at 19:13
  • Added something that can help to the answer – Andres Fiesco Casasola Mar 24 '22 at 22:05
  • Thanks Andres. Since I'm running the code in Jupyterlab GCP Environment, I dont think the question of different locations arises. Again, The code runs smoothly on manual run but not scheduled run. It is not able to read any import statements of libraries or dependent python modules. – Nikita Kini Mar 25 '22 at 09:15
  • 1
    Answered as of why this happens. – gogasca Mar 27 '22 at 16:05