2

When I try to upload a compressed gzip file to cloud storage using a python script on a Cloud Shell instance, it always upload an empty file.

Here's the code to reproduce the errors:

import gzip
from google.cloud import storage

storage_client = storage.Client()

list=['hello', 'world', 'please', 'upload']

out_file=gzip.open('test.gz', 'wt')
    for line in list:
    out_file.write(line + '\n')
out_file.close

out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test')
out_blob.upload_from_filename('test.gz')

It uploads only an empty file named 'test' on my bucket which is not what I expect.

However, my file written in my Cloud Shell is not empty because when I do zcat test.gz it shows the expected content:

hello
world
please
upload
Donnald Cucharo
  • 3,866
  • 1
  • 10
  • 17
Will
  • 2,057
  • 1
  • 22
  • 34
  • Hi OP. If my answer was useful, please consider upvoting it. If it answered your question, then accept it. That way others know that you've been (sufficiently) helped. Also see [What should I do when someone answers my question](https://stackoverflow.com/help/someone-answers)? – Donnald Cucharo Jun 16 '21 at 22:13
  • Hi @Dondi, I will check your answer next week as I can't test it right now, thanks – Will Jun 17 '21 at 14:22

1 Answers1

3

To understand what's happening in your code, here's a description from gzip docs:

Calling a GzipFile object’s close() method does not close fileobj, since you might wish to append more material after the compressed data.

This explains why file objects not being closed affects the upload of your file. Here's a supporting answer which describes the behavior of your code where the fileobj is not being closed, where:

The warning about fileobj not being closed only applies when you open the file, and pass it to the GzipFile via the fileobj= parameter. When you pass only a filename, GzipFile "owns" the file handle and will also close it.

The solution is to not pass the gzipfile via fileobj = parameter and to rewrite it like this:

import gzip
from google.cloud import storage

storage_client = storage.Client()

list=['hello', 'world', 'please', 'upload']

with gzip.open('test.gz', 'rt') as f_in, gzip.open('test.gz', 'wt') as f_out: 
  for line in list:
    f_out.writelines(line + '\n')

out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test.gz') # include file format in dest filename
out_blob.upload_from_filename("test.gz")
Donnald Cucharo
  • 3,866
  • 1
  • 10
  • 17