7

Say I have some bucket/subdirectory on Google Cloud Storage and this bucket's address is:

gs://test-monkeys-example/training_data/cats

In this cats subdirectory I have a bunch of images of cats, all of which are jpgs. How would I in python loop through the cats subdirectory and print out all the names of the files in it?

Something like:

for x in directory('gs://test-monkeys-example/training_data/cats'):
    print(x)

Obviously directory('gs://test-monkeys-example/training_data/cats') is not how to do this and is just psuedocode- how would i do this?!

sometimesiwritecode
  • 2,993
  • 7
  • 31
  • 69

2 Answers2

14

Google Cloud Storage supports listing only objects that begin with a certain prefix. You can access it from the client library like so:

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('mybucket')
for blob in bucket.list_blobs(prefix='training_data/cats'):
  print blob.name
Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
7

Use the storage module:

import google.datalab.storage as storage
cats = [o.key for o in storage.Bucket('test-monkeys-example').objects()
  if o.key.startswith('training_data/cats')]

This gives you a list of such cats.

Alternatively, you could use the Objects class:

cats = [o.key for o in storage.Objects('test-monkeys-example', '', '')
  if o.key.startswith('training_data/cats')]

If you don't need the list put in a variable, you can use the %gcs magic, it's easier:

%gcs list -o gs://test-monkeys-example/training_data/cats/*

This prints an HTML table of the keys. Note this is a full GCS path, starting with gs://.

yelsayed
  • 5,236
  • 3
  • 27
  • 38
  • It seems I get an error if I paste the first example you provided here in, telling me init requires 4 arguments and only 2 were given - thoughts? Specifically the error is: TypeError: __init__() takes at least 4 arguments (2 given) – sometimesiwritecode May 27 '17 at 19:45
  • I was missing the other args, you can just specify empty strings there: `storage.Objects('test-monkeys-example', '', ''). You could also use a Bucket object: `storage.Bucket('test-monkeys-example').objects() – yelsayed May 28 '17 at 09:22