2

I have two Axios based functions - downloadContent and uploadContent

The downloadContent function returns the data as a stream and the uploadContent function uses the stream to upload it.

The idea is to essentially stream data between two backends. But since some of the files can be a bit large (5-10GB+) i dont want to download the file in memory and then upload it.

But thats exactly what im observing with the approach/code below - the process memory usage is going constantly up until it reaches the file size(ish).

async function downloadContent(downloadUrl) {
  return await axios
    .get(downloadUrl, {
      responseType: "stream",
    })
    .then((r) => ({
      file: r.data,
      someOtherProps: {},
    }));
}

async function uploadContent() {
  const download = await downloadContent("download/url");

  return await axios.post("upload/url", download.file, {
    headers: {
      "Content-Type": "specific-content-type",
    },
  });
}

await uploadContent();

Is it something im doing wrong?

And in general how can i achieve streaming between two server to minimize the memory footprint with Axios?

Stefan Stoichev
  • 4,615
  • 3
  • 31
  • 51

1 Answers1

1

It is possible that the stream is being buffered in memory, which is why you are observing an increase in memory usage.

In the original code, you have the downloadContent function, which gets data as a stream and returns it. However, when you call the uploadContent function, you are passing the stream directly into the Axios post method.

return await axios.post("upload/url", download.file, {
    headers: {
      "Content-Type": "specific-content-type",
    },
});

The Axios library, by default, buffers the entire input before making the HTTP request. When you pass the stream directly as the data parameter (download.file) to the axios.post method, Axios waits for the entire stream to be consumed (buffered in memory) before it actually makes the HTTP request.
This is because Axios is designed to work with both browsers and Node.js, and in a browser environment, streams cannot be sent as request data directly.
Therefore, Axios buffers the stream in memory to ensure compatibility across environments. This is what leads to high memory usage for large files.

Plus, you can transform request data before it is sent to the server. Again, the full request is buffered in memory.


To avoid buffering the entire file in memory, you can use the stream as a pipe to upload the file while it is being downloaded. This way, you are essentially passing the data through without holding onto it.

Since Axios does not support being used as a writable stream, you cannot use the pipe method to pipe data directly into axios.post.
Instead, you should pass the readable stream as the data parameter to axios.post, which is similar to what you were doing originally, but ensure that the stream is being handled properly.

const axios = require('axios');

async function downloadContent(downloadUrl) {
  const response = await axios.get(downloadUrl, { responseType: 'stream' });
  return response.data;
}

async function uploadContent(uploadUrl, downloadStream) {
  try {
    await axios.post(uploadUrl, downloadStream, {
      headers: {
        'Content-Type': 'specific-content-type',
      },
      maxContentLength: Infinity,
      maxBodyLength: Infinity
    });
    console.log('Upload successful.');
  } catch (error) {
    console.error('An error occurred:', error);
  }
}

(async () => {
  const downloadStream = await downloadContent('download/url');
  await uploadContent('upload/url', downloadStream);
})();

This code downloads content as a stream and then uploads it as a stream. The key is that we are passing the readable stream as the data parameter to axios.post. This should work in a Node.js environment.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 1
    Thank you very much for the great explanation! – Stefan Stoichev Jun 28 '23 at 18:52
  • @StefanStoichev Can you confirm you did observe a more reasonable memory consumption? – VonC Jun 28 '23 at 18:54
  • i can definitely see that the memory consumption is low (and more consistent). Ive tested it with passing StreamWriter in the PassThrough part. But if i try and pass the axios.post there then IDE is comlaining that `'Promise' is not assignable to parameter of type 'WritableStream'` and if i actually run the code then the error is `dest.on is not a function`. – Stefan Stoichev Jun 28 '23 at 20:14
  • @StefanStoichev OK. I have rewritten the second part of the answer to propose an alternative approach. – VonC Jun 28 '23 at 20:56