6

I'd like to connect Colab to a PAID TPU (upgrading from the free TPU). I created a JSON key using this guide: https://cloud.google.com/docs/authentication/production#auth-cloud-explicit-python, then uploaded it to Colab. I'm able to connect to my storage but not to the TPU:

%tensorflow_version 2.x
import tensorflow as tf
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './gcp-permissions.json'

# Authenticated API request - works.
storage_client = storage.Client.from_service_account_json(
    'gcp-permissions.json')
print(list(storage_client.list_buckets())

#Accessing the TPU - does not work. Request times out.
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(
    tpu='My-TPU-Name',
    zone='us-central1-a',
    project='My-Project-Name'
)

I've also tried the TPUClusterResolver call with just the tpu name, and with 'credentials=gcp-permissions.json' - same result. I've double-checked that my TPU is up and running in the GCP console. It is not preemptible. What am I missing?

Thanks!

Bob Smith
  • 36,107
  • 11
  • 98
  • 91
user9676571
  • 145
  • 6

1 Answers1

1

So it looks like you're trying to connect to a paid TPU from your own Google Cloud project from a Colab notebook, is that right? That won't work as the Colab runtime is backed by a GCE VM that is in a different project than your own My-project-name. So instead, you want to also create a GCE VM in that same project and run your training script from that VM. Checkout this tutorial: https://cloud.google.com/tpu/docs/quickstart.

jysohn
  • 871
  • 6
  • 9
  • 1
    Hmm. If the problem is that the Colab and GCP projects are different, how come I was able to access my GCP buckets from Colab? The link I quoted talks about how to set up permissions to access GCP resources outside of GCP. Do you believe TPUs are only accessible from GCP-hosted clients? – user9676571 Jan 20 '20 at 20:27
  • You can authenticate from a GCE VM in Colab project to access the GCP bucket. However, you can't make the GCE VM in Colab project to share networks with the TPU in your own project as you don't have access to the Colab GCP project. One is credentials (GCS) and other is networking. – jysohn Jan 21 '20 at 21:57
  • Sorry, not following. If I can access a free TPUs from Colab, why not a paid TPU? If I can't, why isn't there an error message - just a timeout. If I can't switch projects, why does TPUClusterResolver have a 'project' argument? – user9676571 Jan 22 '20 at 01:34
  • 1
    Incidentally, I tried to set up a paid VM client inside GCE instead of using a Colab client, but this happened: https://stackoverflow.com/questions/59851553/huggingface-bert-tpu-fine-tuning-works-on-colab-but-not-in-gcp – user9676571 Jan 22 '20 at 01:49
  • 1
    As far as I know, the project argument is only in case the project cannot be identified from the GCE VM's metadata. The key here is that the GCE VM and the TPU need to be placed on the same network so that they can talk to each other. Unfortunately, the Colab VMs is in one network that the Colab team maintains, whereas your TPU is in your own project in its own network and thus the two cannot talk to each other. My recommendation here would be to setup a separate GCE VM in your own project and drive the TPU from there. You can setup jupyter notebook servers on your GCE VM as well. – jysohn Jun 05 '20 at 18:23