0

I'm retrieving a gzipped csv file from an FTP server and storing it in Google Cloud Storage. I need another GCP service, Dataprep, to read this file. Dataprep works only with csv, it can't unzip it on the fly.

So, what would be the proper way to unzip it? Here is my code:

import FTPClient from 'ftp'

const file = bucket.file(path)

var ftpServer = new FTPClient()
ftpServer.on('ready', () => {
  ftpServer.get('/file.gz', (err, stream) => {
    if (err) throw err
    stream.once('close', () => {
      ftpServer.end()
      resolve(true)
    })
    stream.pipe(
      file.createWriteStream({
        resumable: false,
        public: false,
        gzip: true
      })
    )
  })
})
ftpServer.connect({
  host: 'somehost.com',
  user: 'user',
  password: '******'
})

I've seen this question. I'm not sure if this is the optimal solution. As far as I understand, that code will read the file, load it to my server memory and then write it back. This seems like a huge waste of memory and traffic. Is there a better way to unzip it?

stkvtflw
  • 12,092
  • 26
  • 78
  • 155

2 Answers2

1

I think that you don't need to store ungzipped the file. You need to set the correct content type and content encoding (it is automatically set to gzip with the option gzip: true, something like that


 stream.pipe(
      file.createWriteStream({
        contentType: 'text/plain',
        resumable: false,
        public: false,
        gzip: true
      })
    )

If the requester doesn't set the header Accept-encoding: gzip in the header, the file is served uncompressed. It's described in the documentation

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
0

Figured it. I use zlib.

import zlib from 'zlib'

...
const unzipper = zlib.createGunzip()
stream.pipe(unzipper).pipe(
  file.createWriteStream({
    resumable: false,
    public: false,
    gzip: true
  })
)
...
stkvtflw
  • 12,092
  • 26
  • 78
  • 155