10

Does anybody know the storage limits for running Google Colab? I seem to run out of space after uploading 22gb zip file, and then trying to unzip it, suggesting <~40gb storage being available. At least this is my experience running the TPU instance.

Ferhat
  • 394
  • 1
  • 4
  • 17
  • 1
    Link https://stackoverflow.com/questions/50260565/how-to-increase-google-colab-storage – yusuf hayırsever Oct 27 '18 at 15:48
  • "Just use GPU runtime" doesn't work anymore (or may not be needed), but I would consider writing some custom data loaders which uses `zipfile` to load data straight from the archive. Unzipping e.g. COCO dataset takes forever anyway, even if you have the storage to do so. – Tomasz Gandor Feb 15 '20 at 10:47

2 Answers2

14

Presently, the amount of local storage in colab depends on the chosen hardware accelerator runtime type:

# Hardware accelerator none
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay          49G   22G   26G  46% /

# Hardware accelerator GPU
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay         359G   23G  318G   7% /

# Hardware accelerator TPU
!df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay          49G   22G   26G  46% /

Even if you don't need a GPU, swithcing to that runtime type will provide you with an extra 310Gb of storage space.

marsipan
  • 343
  • 2
  • 11
  • 2
    Thank you for the update. Yes, makes sense just to use the GPU just for the extra storage. – Ferhat Sep 19 '19 at 14:01
  • 1
    "Resources not guaranteed". Today, 2020-09-20, I tried the GPU instance and I got ~70GB of overlay space and the hardware accelerator none got ~110GB of space. All this to say, it's super YMMV! – nelsonjchen Sep 20 '20 at 18:15
  • 2
    Today, I got 108GB 69GB 108GB. – idontknow Dec 07 '20 at 23:58
13

Yes, the Colab notebook local storage is about 40 GiB right now. One way to see the exact value (in Python 3):

import subprocess
p = subprocess.Popen('df -h', shell=True, stdout=subprocess.PIPE)
print(str(p.communicate()[0], 'utf-8'))

However: for large amounts of data, local storage is a non-optimal way to feed the TPU, which is not connected directly to the machine running the notebook. Instead, consider storing your large dataset in GCP storage, and sourcing that data from the Colab notebook. (Moreover, the amount of Colab local storage may change, and the Colab notebook itself will expire after a few hours, taking local storage with it.)

Take a look at the canonical TPU Colab notebook. At the bottom are some next steps, which include a link to Searching Shakespeare with TPUs. In that notebook is the following code fragment, which demonstrates GCP authentication to your Colab TPU. It looks like this:

from google.colab import auth
auth.authenticate_user()

if 'COLAB_TPU_ADDR' in os.environ:
  TF_MASTER = 'grpc://{}'.format(os.environ['COLAB_TPU_ADDR'])

  # Upload credentials to TPU.
  with tf.Session(TF_MASTER) as sess:    
    with open('/content/adc.json', 'r') as f:
      auth_info = json.load(f)
    tf.contrib.cloud.configure_gcs(sess, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.
else:
  TF_MASTER=''
Derek T. Jones
  • 1,800
  • 10
  • 18
  • 1
    @DerekT.Jones I work in colab with keras, instead of pure tensorflow. This means I don't declare `tf.session`. Would the code above work also in my case? – NeStack Aug 19 '19 at 08:41