2

Learning how to do large file manipulation with Node and streams I'm stuck in the middle of a file change when passing down the results to a module and I think the process is still in memory when it reaches another module.

I get a zip from an s3 bucket locally and unzip the contents:

try {
  const stream = fs.createReadStream(zipFile).pipe(unzipper.Extract({ path }))
  stream.on('error', err => console.error(err))

  stream.on('close', async () => {
    fs.removeSync(zipFile)

    try {
      const neededFile = await dir(path) // delete files not needed from zip, rename and return named file
      await mod1(neededFile) // review file, edit and return info
      await mod2(neededFile, data) // pass down data for further changes
      return
    } catch (err) {
      console.log('error')
    }
  })
} catch (err) {
  console.log('stream error')
}

Initial unzip I learned that there is a difference between stream on close and finish because I could pass the file to the first module and start the manipulation but the file, I guess due to the size, output and file never matched. After cleaning the files I dont need I pass the renamed file to mod1 for changes and run a write file sync:

mod1.js:

const fs = require('fs-extra')

module.exports = file => {
  fs.readFile(file, 'utf8', (err, data) => {
    if (err) return console.log(err)
    try {
      const result = data.replace(/: /gm, `:`).replace(/(?<=location:")foobar(?=")/gm, '')
      fs.writeFileSync(file, result)
    } catch (err) {
      console.log(err)
      return err
    }
  })
}

when I tried to do the above with:

  const readStream = fs.createReadStream(file)
  const writeStream = fs.createWriteStream(file)
  readStream.on('data', chunk => {
    const data = chunk.toString().replace(/: /gm, `:`).replace(/(?<=location:")foobar(?=")/gm, '')
    writeStream.write(data)
  })
  readStream.on('end', () => {
    writeStream.close()
  })

the file would always be blank. After writeFileSync I proceed with the next module to search for a line ref:

mod2.js:

const fs = require('fs-extra')

module.exports = (file, data) => {
  const parseFile = fs.readFileSync(file, 'utf8')
  parseFile.split(/\r?\n/).map((line, idx) => {
    if (line.includes(data)) console.log(idx + 1)
  })
}

but the line number returned is that of the initial unzipped file not the file that was modded from the first module. Because I thought the sync process would be for the file it would appear the file being referenced is in memory? My search results for streams when learning about them:

How should a file be manipulated after an unzip stream and why does the second module reference the file after it was unzipped and not when it was already manipulated? Is it possible to write multiple streams synchronously?

DᴀʀᴛʜVᴀᴅᴇʀ
  • 7,681
  • 17
  • 73
  • 127

0 Answers0