3

Does anyone know a good way to load a set of files locally into the Java dev_appserver's emulated Cloud Storage.

This didn't work:

$ gsutil rsync gs://mybucket http://localhost:8888/mybucket
InvalidUrlError: Unrecognized scheme "http".

I'm open to suggestions on either:

  1. How to load a bunch of files locally (preferably through gsutil)
  2. How to point my local dev_appserver to a non-emulated bucket at Google

This is painful to test things out locally without proper data. I'm trying to write some transformations to load data into BigQuery (from Datastore backups) and it won't be possible without some real data.

BK-
  • 527
  • 7
  • 19
  • To copy data to a local dir with `rsync`, `gsutil rsync -d /localdirgs://mybucket/data`. – Alex Martelli Feb 25 '15 at 04:46
  • Local copy isn't an issue, what I need is a way to load this into my dev_appserver. – BK- Feb 25 '15 at 04:51
  • `dev_appserver`**.py** can be told what directories to use for local files, via command-line flags -- but, again, I don't know enough about Java to help. – Alex Martelli Feb 25 '15 at 04:55

1 Answers1

1

"How to point my local dev_appserver to a non-emulated bucket at Google": it's not documented all that clearly, but it is implemented in the dev_appserver and cloudstorage.

To verify what I'm saying, first svn checkout http://appengine-gcs-client.googlecode.com/svn/trunk/python gcs-client to get cloudstorage's source code onto your machine (you'll need to install subversion if you don't have it already, but, that's free too:-).

Then, cd gcs-client/src/cloudstorage/ and look at storage_api.py. In the very first function _get_storage_api, the docstring says:

On dev appserver, this instance by default will talk to a local stub
unless common.ACCESS_TOKEN is set. That token will be used to talk
to the real GCS.

So, look at common.py, and again in the first function, set_access_token, you'll see:

Args: access_token: you can get one by run 'gsutil -d ls' and copy the str after 'Bearer'.

So there you are -- in every entry to your app (best in appengine_config.py in your root directory), import cloudstorage's common module, then **if and only if you're on dev_appserver[*] call

common.set_access_token('whatever_the_token')

using as argument string the one you get by run 'gsutil -d ls', right after Bearer i.e among much else you'll spot something like (faking and way shortening the actual value...:-):

Bearer xy15.WKXJQEzXPQQy2dt7qK9\r\n

in which case you'd be calling

common.set_access_token('xy15.WKXJQEzXPQQy2dt7qK9')

[*] many ways to find out if you're on dev_appserver, e.g see GAE: python code to check if i'm on dev_appserver or deployed to appspot .

Community
  • 1
  • 1
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Thanks for the quick reply Alex, really appreciate it. I did find previous answers to this for Python, but not Java. Sorry if that was confusing in my question, I tried to bold it to make it clear. Do you know of the same for the Java SDK? – BK- Feb 25 '15 at 04:48
  • @BK-, sorry, my Java is weak -- somebody with strong Java should surely be able to find a similar approach in the Java SDK's sources, as I did in the Python SDK's sources! If you **only** want Java answers, add Java to your tags -- this way, those of us who know too little Java to help will know not to try to help you... – Alex Martelli Feb 25 '15 at 04:52
  • Thanks for the suggestion Alex. Added the tag! :-) – BK- Feb 25 '15 at 05:19
  • @BK- When you say you found answers for python, was this for your second question, or your first as well? I also want to preload some GCS data on my dev appserver, but can't find a way. – Remko Jul 22 '15 at 19:54
  • Neither was answered. Sorry! :( – BK- Jul 23 '15 at 20:36