5

Having various problems accessing with GCP Vertex AI Workbench managed notebooks. Could really use some suggestions about recovering, and avoiding further failure.

The original behavior (two days ago) was

  • After working in the JupyterLab instance for a bit over an hour (creating a handful of notebooks within the instance), some kind of connectivity is lost.
    • Inside the JupyterLab interface: cells won't run, the notebook is unable to save to disk or export, and restarting the kernel doesn't work.
    • On-screen error pop-up: 502, with message mentioning "bad gateway" or something like that
    • Back out in the console screen for managing my Workbench instances, I was able to use the Reset command to get the instance back to a working state.

**Note: ** This instance was provisioned with a setting to suspend after one hour of idle time. It's not obviously relevant; the failure was a little more than an hour after creation, but there certainly wasn't an hour of idle time before things went to heck.

Today, I came back and was again able to work in the same instance for a bit over an hour, but then the same symptoms locked in. Couldn't execute code, couldn't save the notebook.

However, things are worse now, because hitting Reset has led to an endless period of spinning cursor. The instance won't complete its reset and can't start. When I hover over the spinning cursor where the OPEN JUPYTERLAB button ought to be, a hover box says "Setting up proxy to JupyterLab".

The hover text for the Instance status says: "Provisioning".

More: I also tried creating a new notebook instance from the Workbench console screen, and it's stuck in the same condition -- just spinning, never reaching running state. If I try to Reset it, a minor little pop-up appears at the bottom of the screen like so: enter image description here

Subsequently, the hover text raised by the Reset button is: enter image description here

At the least, I'm hoping to regain access to the initial instance at least once to recover some code in the notebooks (and go run it in a less flaky cloud service). At best, you could help me manage this so that this GCP service is actually viable over time for me.

gogasca
  • 9,283
  • 6
  • 80
  • 125
David Kaufman
  • 989
  • 1
  • 7
  • 20
  • If you have premium support, you can check with [GCP Support](https://cloud.google.com/support-hub) to further check your issue since this is specific to your project. – Anjela B Jun 17 '22 at 01:25
  • I don't have any solution, but I have also noticed the same problem over the past week or two. This seems to be a new issue. – carbocation Jun 27 '22 at 20:20
  • it could be also related to disk management. Are you able to connect to the machine's serial port? perhaps it can tell you some more information on what's going on. Also, I'd try to rescue the files using direct ssh connection – gidutz Jun 30 '22 at 16:10
  • A problem was identified and corrected few days ago (Losing access after an hour and Reset issue). As Anjela suggested please open a ticket. New instances should not have this issue. Thanks – gogasca Jul 02 '22 at 08:10
  • Any update on this? I'm having the same issue and it's extremely frustrating for a time-sensitive project. How was this resolved? – Tanishq Kumar Jul 29 '22 at 18:18
  • You should create new instances with new image (After August 17th) we did major updates for reliability. We will also release a new Upgrade endpoint in the next few weeks – gogasca Sep 02 '22 at 04:33

0 Answers0