0

Goal

I want to download a large zip file from an online database into a GCP bucket. I'm not very technical when it comes to GCP and working in the terminal, and I've stumbled upon some issues which I haven't been able to fix.

What I've tried

I tried to do the above in a few different ways. First I tried the following command in the cloud shell on the GCP: curl -O https://website/file.zip | gsutil cp - gs://bucke/file.zip, didn't work so I tried it in Google Cloud SDK shell on my Windows computer and I got the following output (cURL without gsutil gives the same output):

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  1 43.5G    1  735M    0     0   239k      0 52:55:07  0:52:23 52:02:44     0
curl: (56) Send failure: Connection was reset
'count' is not recognized as an internal or external command,
operable program or batch file.
Copying from <STDIN>...
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.

Lastly, I tried the cURL command in a (Ubuntu) VM, and it worked fine. The only problem with that is that I don't have sufficient permission to upload the files into a GCP bucket using gsutil (I get a 403 error, also when uploading other files).

Hypothesized issue

I've noticed I'm not the only one with a similar problem, so I looked into several proposed solutions (1,2,3), this solution mentions that it must be some issues with my system as it works fine in a VM. When I try to download & upload a small file from cloud sdk shell on my computer (with the same command) from a different website it works fine. Downloading a way smaller zip file from the same website doesn't return the error as previously, but this instead:

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.2M  100 12.2M    0     0   313k      0  0:00:40  0:00:40 --:--:--  354k

'count' is not recognized as an internal or external command,
operable program or batch file.
Copying from <STDIN>...
/ [1 files][    0.0 B/    0.0 B]      0.0 B/s
Operation completed over 1 objects.

and when I download the zip file from the GCP bucket, it seems to be an invalid zip file.

So apparently the problem lies within the connectivity of the website with my computer. (because other computer with same website works, and other website with same computer works), I'm guessing this might be a firewall issue, but my knowledge about this is very limited. Can someone help me to go from here to continue the troubleshooting? How do I figure out whether it is indeed a firewall issue, and how do I fix this (or find information on how to do this step by step)?

Any help is much appreciated!

Mira
  • 65
  • 1
  • 9

2 Answers2

0

Remove -O from the curl command. Your file is being saved locally (check for file.zip in your local directory), instead of being piped into gsutil.

curl https://website/file.zip | gsutil cp - gs://bucke/file.zip

-O flag for curl forces output to the file, instead of STDOUT, so next command in the pipeline (gsutil) receives nothing.

Anton
  • 3,587
  • 2
  • 12
  • 27
0

When you are using streaming uploads for a large files it is recommended write the data first to a local file [1]. So you could try:

First, download to local file.

curl -O URL

Second upload the file to your bucket [2]

gsutil cp file gs://bucket

Also, maybe you could try a test with a small file:

curl "http://nginx.org/download/nginx-1.17.10.zip" | gsutil cp - "gs://bucke/nginx-1.17.10.zip"

[1] https://cloud.google.com/storage/docs/gsutil/commands/cp#streaming-transfers

[2] https://cloud.google.com/storage/docs/gsutil/commands/cp#copying-tofrom-subdirectories-distributing-transfers-across-machines