2

How do I retrieve a file(s) or directory listing from Google Cloud Storage (GCS) if I have the gs:// URI?

For example, gs://dataflow-samples/shakespeare/* is used by the Google DataFlow example app MinimalWordCount. I assume it is publicly accessible because my app can read it. But how can I download the file(s) without writing a DataFlow app to do it?
I did see this post, but it only addresses hows to identify the matching files, and it is also doing it programatically in the DataFlow app, which is not what I am looking for.

Ideally, I would like to be able to download (or browse the directory) using my browser. Are there any plugins? Or is there a way to convert the gs URI into a http(s) URL?

If I can't do it via a Browser, then what are my alternatives for download? Is it possible to download via my console using my account?

Community
  • 1
  • 1
successhawk
  • 3,071
  • 3
  • 28
  • 44

3 Answers3

1

Yes, GCS is pretty easy to use directly.

There are some well-defined URLs that you can use to download public GCS objects. If your object is gs://BucketName/ObjectName, then you can download it at this URL: https://storage.googleapis.com/BucketName/ObjectName.

If you want to list objects in a bucket, such as finding all of the objects matching a pattern like gs://dataflow-samples/shakespeare/*, you'll want to use of of GCS's APIs. There are two, XML and JSON. Also, listing objects requires that the bucket owner grants permission, either to anonymous users or to you specifically. If anonymous users have list permission, it's as simple as fetching https://storage.googleapis.com/dataflow-samples?prefix=shakespeare (to get XML results) or https://www.googleapis.com/storage/v1/b/dataflow-samples/o?prefix=shakespeare (to get JSON results). If you're going to be doing this more than once or twice, you'll also want to include an API key with your request.

Here's the API documentation for object listing:
https://cloud.google.com/storage/docs/xml-api/get-bucket-list (for XML) https://cloud.google.com/storage/docs/json_api/v1/objects/list (for JSON)

Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
0

There is a command line tool gsutil that can be used to interact with GCS. The semantics are very similiar to working with your local filesystem.

The Google Cloud Developers console provides a storage browser that you can use to browse the contents of GCS buckets.

These instructions show how to use the console to browse public buckets.

Here's a link to browse the Dataflow samples. The URL is

https://console.cloud.google.com/storage/dataflow-samples
Jeremy Lewi
  • 6,386
  • 6
  • 22
  • 37
0

There are generic answers here pointing to the right sources of information. However, for a quick answer on how to retrieve a file from GSC using the console:

>gsutil cp [gs URL of the resource] [destination folder]

The utility help:

>gsutil help

If you don't have gsutil but want to install it look here: https://cloud.google.com/storage/docs/gsutil_install

dmitri
  • 3,183
  • 23
  • 28