How do I make Google Cloud Storage unzip a gzipped file?

Question

I'm retrieving a gzipped csv file from an FTP server and storing it in Google Cloud Storage. I need another GCP service, Dataprep, to read this file. Dataprep works only with csv, it can't unzip it on the fly.

So, what would be the proper way to unzip it? Here is my code:

import FTPClient from 'ftp'

const file = bucket.file(path)

var ftpServer = new FTPClient()
ftpServer.on('ready', () => {
  ftpServer.get('/file.gz', (err, stream) => {
    if (err) throw err
    stream.once('close', () => {
      ftpServer.end()
      resolve(true)
    })
    stream.pipe(
      file.createWriteStream({
        resumable: false,
        public: false,
        gzip: true
      })
    )
  })
})
ftpServer.connect({
  host: 'somehost.com',
  user: 'user',
  password: '******'
})

I've seen this question. I'm not sure if this is the optimal solution. As far as I understand, that code will read the file, load it to my server memory and then write it back. This seems like a huge waste of memory and traffic. Is there a better way to unzip it?

Can you show how you write your file on Google Cloud Storage? I guess that the correct content-type is plain/text correct? — guillaume blaquiere, Apr 08 '21 at 19:41
`file.createWriteStream()` is the code writing the file to GCS. — stkvtflw, Apr 09 '21 at 04:49

score 1 · Accepted Answer · answered Apr 09 '21 at 07:09

I think that you don't need to store ungzipped the file. You need to set the correct content type and content encoding (it is automatically set to gzip with the option gzip: true, something like that


 stream.pipe(
      file.createWriteStream({
        contentType: 'text/plain',
        resumable: false,
        public: false,
        gzip: true
      })
    )

If the requester doesn't set the header Accept-encoding: gzip in the header, the file is served uncompressed. It's described in the documentation

score 0 · Answer 2 · answered Apr 09 '21 at 04:49

0

Figured it. I use zlib.

import zlib from 'zlib'

...
const unzipper = zlib.createGunzip()
stream.pipe(unzipper).pipe(
  file.createWriteStream({
    resumable: false,
    public: false,
    gzip: true
  })
)
...

answered Apr 09 '21 at 04:49

stkvtflw

12,092
26
78
155

How do I make Google Cloud Storage unzip a gzipped file?

2 Answers2