2

I have two servers that communicate with each other. Server1 requests for parts of the file from Server2 and store the data received into one file. Server2 is supposed to receive each of these requests and create a stream pipes the data over.

Suppose the files stored(directory) in Server 2 are as following

bigfile.gz
   bigfile.gz.part-0
   bigfile.gz.part-1
   bigfile.gz.part-2
   ......

So Server1 will send a request for part-0 then part-1 and so on to the Server2. Hence the use of the loop to make requests.

Server 1 (code snippet)

for (var i in requestInfo['blockName']) {
            var blockName = i;
            var IP = requestInfo['blockName'][i][0];
            var fileData = JSON.stringify({
                blockName: blockName,
                fileName: requestInfo['fileName']
            });
            makeRequest(fileData, IP);
            console.log(counter);
 }

    function makeRequest(fileData, IP) {
        var options = {
            host: IP,
            port: 5000,
            path: '/read',
            method: 'POST',
            headers: {
                'Content-Type': 'application/json'
            }
        };

        var req = http.request(options, function(res) {
            var data = '';
            res.on('data', function(chunk) {
                data += chunk;
            });

            res.on('end', function() {
                console.log(data.length);
                //fs.appendFileSync(fileName, data);
                var writeStream = fs.createWriteStream(fileName, { "flags": 'a' });
                writeStream.write(data);
                writeStream.end();
            });
        });

        req.write(fileData); 
        req.end();
    }

Server 2 (code snippet)

app.post('/read', function(req, res) {
    var dataBody = req.body;
    fs.createReadStream(dataBody.fileName + '/' + dataBody.blockName).pipe(res);
});

The one above works for when I test it with a 100MB txt file. But it fails when i have 1GB .gz file or even when I test it with a .zip file the output the final .zip generated on the Server 1 side is the incorrect size.

I am not sure what I am doing wrong here or is the alternate solution

EDIT:

Also my Server1 crashes when dealing with the big 1GB .gz file

RRP
  • 2,563
  • 6
  • 29
  • 52
  • 2
    Youre treating all the contents as text - which is why it works fine for a text file but not a binary file! Found [this](https://stackoverflow.com/questions/14855015/getting-binary-content-in-node-js-using-request) which might help you. Also [this](https://stackoverflow.com/questions/17836438/getting-binary-content-in-node-js-with-http-request) is probably your answer. – Jamiec Mar 14 '18 at 07:55
  • I followed your suggested links, but I am seeing this error TypeError: "list" argument must be an Array of Buffers even though I am passing an array – RRP Mar 14 '18 at 08:27
  • @Jamiec it works now, I was setting res.setencoding(..) don't need this. But a 1GB file crashes the app – RRP Mar 14 '18 at 08:32
  • You should stream the response directly to the file, that way you will keep the memory consumption at minimum – Alex Michailidis Mar 14 '18 at 12:52
  • @alex-rokabilis could you provide an example – RRP Mar 14 '18 at 16:48

1 Answers1

1

Your main problem here is that you treating your data as string by appending chunks to a string.

By rewriting this should be

var req = http.request(options, function(res) {
  var data = [];
  res.on('data', function(chunk) {
    data.push(chunk);
  });

  res.on('end', function() {
    fs.writeFile(fileName, Buffer.concat(data), function() {
      console.log("write end")
    });
  });
});

That way we are creating a big array of binary chunks, and when the download is complete we write the concatenation of all the chunks to a file.

But notice the word big

If you stick with this implementation you are risking to get out of memory, especially if you are dealing with large (>500mb) files.

Streams to the rescue

var req = https.request(options, function(res) {
  res.pipe(fs.createWriteStream(fileName)).on("close", function() {
    console.log("write end");
  });
});

Using the above implementation memory footprint should stay low. Because the moment you get a specific amount of data from your download, you write them to the file. That way you never keep the whole file into the program's memory.

Alex Michailidis
  • 4,078
  • 1
  • 16
  • 35