0

How can one download files from a GCP Storage bucket to a Container-Optimised OS (COS) on instance startup?


I know of the following solutions:

Yet all of these have to be done manually and externally after an instance is started.

There is also cloud init, yet I can't find any info on how to copy files from a Storage bucket. Examples seem to be suggesting that it's better to include content of files in the cloud init file directly, which is not something I want to do because security. Is it possible to download files from Storge bucket using cloud init?

I considered using a startup script, yet COS lacks CLI tools such as gcloud or gsutil to be able to run any such commands in a startup script.

I know I could copy the files manually and then save the image as a boot disk, but I'm hoping there are solutions that avoid having to do so.

Most of all, I'm assuming I'm not asking for something impossible, given that COS instance setup allows me to specify Docker volumes that I could mount onto the starting container. This seems to suggest I should be able to have some private files on the instance the moment COS will attempt to run my image on startup. But how?

gcp_volume_mount


Trying to execute a startup-script with a cloud-sdk image and copying files there as suggested by Guillaume didn't work for me for a while, showing this log. Eventually I realised that the cloud-sdk image is 2.41GB when uncompressed and takes over 2 minutes to complete pulling. I tried again with an empty COS instance and the startup script completed successfully, downloading the data from a Storage bucket.

However, a 2.41GB image and over 2 minutes of boot time sound like a bit of an overkill to download a 2KB file. Don't they?

I'm glad to see a working solution to my question (thanks Guillaume!) although I'm still wondering: isn't there a nicer way to do this? I feel that this method is even less tidy than manually putting the files on the COS instance and then creating a machine image to use in the future.

Voy
  • 5,286
  • 1
  • 49
  • 59

2 Answers2

4

Based on Guillaume's answer I created and published a gsutil wrapper image, available as voyz/gsutil_wrap. This way I am able to run a startup-script with the following command:

docker run -v /host/path:/container/path \
  --entrypoint gsutil voyz/gsutil_wrap \
  cp gs://bucket/path /container/path

It's essentially a copy of what Guillaume suggested, except it is using an image containing only a minimum setup required to run gsutil. As a result it weighs 0.22GB and pulls within 10-20 seconds on average - as opposed to 2.41GB and over 2 minutes respectively for the google/cloud-sdk image suggested by Guillaume.

Also, credit to this incredibly useful StackOverflow answer that allows gsutil to use the default service account for authentication.

Voy
  • 5,286
  • 1
  • 49
  • 59
2

The startup-script is the correct location to do this. And YES, COS lacks some useful library.

BUT you can run container! And, for example, the Google Cloud SDK container!

So, add this startup-script in the VM metadata:

  • key -> startup-script
  • value ->
docker run -v /local/path/to/copy/files:/dummy/container/path \
  --entrypoint gsutil google/cloud-sdk \
  cp gs://your_bucket/path/to/file /dummy/container/path

Note: the startup script is ran in root mode. Perform a chmod/chown in your startup script if you need to change the file access mode.

Let me know if you need more explanation on this command line


Of course, with a fresh COS image, the startup time is quite long (pull the container image and extract it).

To reduce the startup time, you can "bake" your image. I mean, start with a COS, download/install what you want on it (or only perform a docker pull of the googkle/cloud-sdk container) and create a custom image from this.

Like this, all the required dependencies will be present on the image and the boot start will be quicker.

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
  • This indeed worked! It took me a bit of back and forth due to the image size being quite large (2.41GB) and the instance maxing out on space - in case you'd want to include that info in your answer. I've updated my question in response to your answer, would you be able to follow up? Most of all though, thanks a lot for your solution! Glad to hear there at least is some way of doing it. – Voy Oct 28 '20 at 09:44
  • 1
    Understood. I provided a solution. Think to renew periodically the image and the cloud-sdk container version to have the latest (and with the latest patches) to prevent any security issues. – guillaume blaquiere Oct 28 '20 at 12:49
  • Thanks for following up! I try to avoid relying on an image, as this indeed creates a chore of updating it, a security risk and is not recommended in the GCP docs, although I appreciate you outlining the method in detail. I ended up publishing a wrapper image for gsutil and use that instead - check my answer on this question. No doubt credit goes to you for pointing me in the right direction though! – Voy Oct 29 '20 at 05:17