13

Using Express with Node, I can upload a file successfully and pass it to Azure storage in the following block of code.

app.get('/upload', function (req, res) {
    res.send(
    '<form action="/upload" method="post" enctype="multipart/form-data">' +
    '<input type="file" name="snapshot" />' +
    '<input type="submit" value="Upload" />' +
    '</form>'
    );
});

app.post('/upload', function (req, res) {
    var path = req.files.snapshot.path;
    var bs= azure.createBlobService();
    bs.createBlockBlobFromFile('c', 'test.png', path, function (error) { });
    res.send("OK");
});

This works just fine, but Express creates a temporary file and stores the image first, then I upload it to Azure from the file. This seems like an inefficient and unnecessary step in the process and I end up having to manage cleanup of the temp file directory.

I should be able to stream the file directly to Azure storage using the blobService.createBlockBlobFromStream method in the Azure SDK, but I am not familiar enough with Node or Express to understand how to access the stream data.

app.post('/upload', function (req, res) {

    var stream = /// WHAT GOES HERE ?? ///

    var bs= azure.createBlobService();
    bs.createBlockBlobFromStream('c', 'test.png', stream, function (error) { });
    res.send("OK");
});

I have found the following blog which indicates that there may be a way to do so, and certainly Express is grabbing the stream data and parsing and saving it to the file system as well. http://blog.valeryjacobs.com/index.php/streaming-media-from-url-to-blob-storage/

vjacobs code is actually downloading a file from another site and passing that stream to Azure, so I'm not sure if it can be adapted to work in my situation.

How can I access and pass the uploaded files stream directly to Azure using Node?

Charlie Brown
  • 2,817
  • 2
  • 20
  • 31
  • I'm not familiar with node but the [Github page](https://github.com/WindowsAzure/azure-sdk-for-node) may provide a hint. `var stream = fs.createReadStream(req.files.snapshot.path);` where `fs` is defined as `var fs = require('fs');` – Dustin Kingen Aug 21 '13 at 17:06
  • @Romoku unfortunately, `req.files.snapshot.path` is the path to the file on disk. I need to capture it before it becomes an actual file. – Charlie Brown Aug 21 '13 at 17:18
  • From what I have read the [`bodyParser`](http://expressjs.com/api.html#bodyParser) middleware handles writing the file to disk. As far ask I can tell you'll need to implement your own middleware in order to intercept the file before it get written to disk. – Dustin Kingen Aug 21 '13 at 17:25
  • Also consider the implication of not writing the file to disk. You'll lose durability and increase the application memory footprint. – Dustin Kingen Aug 21 '13 at 17:31

4 Answers4

18

SOLUTION (based on discussion with @danielepolencic)

Using Multiparty(npm install multiparty), a fork of Formidable, we can access the multipart data if we disable the bodyparser() middleware from Express (see their notes on doing this for more information). Unlike Formidable, Multiparty will not stream the file to disk unless you tell it to.

app.post('/upload', function (req, res) {
    var blobService = azure.createBlobService();
    var form = new multiparty.Form();
    form.on('part', function(part) {
        if (part.filename) {

            var size = part.byteCount - part.byteOffset;
            var name = part.filename;

            blobService.createBlockBlobFromStream('c', name, part, size, function(error) {
                if (error) {
                    res.send({ Grrr: error });
                }
            });
        } else {
            form.handlePart(part);
        }
    });
    form.parse(req);
    res.send('OK');
});

Props to @danielepolencic for helping to find the solution to this.

Charlie Brown
  • 2,817
  • 2
  • 20
  • 31
  • Just to note, according to Multiparty's repo on github, https://github.com/andrewrk/node-multiparty/ , they claim busboy is a newer, faster alternative that may be worth looking into: https://github.com/mscdex/busboy – chrisco512 Jan 09 '15 at 03:38
  • The benchmarks claim busboy (formerly dicer) is more than twice as fast as multiparty and formidable: https://github.com/mscdex/dicer/wiki/Benchmarks – chrisco512 Jan 09 '15 at 03:48
  • 1
    Just a heads up: with this solution, I was finding that the tail end of the image wasn't uploading correctly. The solution was to *not* subtract the byteOffset from the size. – Soroush Khanlou Jun 19 '15 at 15:17
  • @SoroushKhanlou Quite possible, as this answer is almost 2 years old. – Charlie Brown Jun 19 '15 at 18:52
  • is there any sample code on how to achieve the same with busboy? I cant get it to work unfortunately :( – Qiong Wu Jul 27 '15 at 11:07
  • This method seems to fail when used with large files. Any ideas? – RonLugge Oct 30 '15 at 22:31
9

As you can read from the connect middleware documentation, bodyparser automagically handles the form for you. In your particular case, it parses the incoming multipart data and store it somewhere else then exposes the saved file in a nice format (i.e. req.files).

Unfortunately, we do not need (and necessary like) black magic primarily because we want to be able to stream the incoming data to azure directly without hitting the disk (i.e. req.pipe(res)). Therefore, we can turn off bodyparser middleware and handle the incoming request ourselves. Under the hood, bodyparser uses node-formidable, so it may be a good idea to reuse it in our implementation.

var express = require('express');
var formidable = require('formidable');
var app = express();

// app.use(express.bodyParser({ uploadDir: 'temp' }));

app.get('/', function(req, res){
  res.send('hello world');
});

app.get('/upload', function (req, res) {
    res.send(
    '<form action="/upload" method="post" enctype="multipart/form-data">' +
    '<input type="file" name="snapshot" />' +
    '<input type="submit" value="Upload" />' +
    '</form>'
    );
});

app.post('/upload', function (req, res) {
  var bs = azure.createBlobService();
  var form = new formidable.IncomingForm();
  form.onPart = function(part){
    bs.createBlockBlobFromStream('taskcontainer', 'task1', part, 11, function(error){
      if(!error){
          // Blob uploaded
      }
    });
  };
  form.parse(req);
  res.send('OK');
});

app.listen(3000);

The core idea is that we can leverage node streams so that we don't need to load in memory the full file before we can send it to azure, but we can transfer it as it comes along. The node-formidable module supports streams, hence piping the stream to azure will achieve our objective.

You can easily test the code locally without hitting azure by replacing the post route with:

app.post('/upload', function (req, res) {
  var form = new formidable.IncomingForm();
    form.onPart = function(part){
      part.pipe(res);
    };
    form.parse(req);
});

Here, we're simply piping the request from the input to the output. You can read more about bodyParser here.

Community
  • 1
  • 1
danielepolencic
  • 4,905
  • 1
  • 26
  • 25
  • Thanks, this is closer to the final solution, but having form.onPart in the callback of `bs.createBlockBlobFromFile` wont help here. `createBlockBlobFromFile` is expecting a file path, which wont exist at this point. I need to somehow combine `form.onPart` with `createBlockBlobFromStream` which takes a `ReadableStream` object. – Charlie Brown Aug 21 '13 at 19:49
  • I updated the snippet according to the [azure sdk for node](https://github.com/WindowsAzure/azure-sdk-for-node). But more simply, `part` is _(also)_ a readable stream (i.e. it provides an output) and thus is equivalent to `fs.createReadStream`. – danielepolencic Aug 21 '13 at 20:01
  • This is getting closer. We need to replace the `11` with the correct stream size. Giving it `form.expectedBytes` is incorrect, I also tried just specifying the correct bytes in code, but that doesn't work either. `createBlobFromBlockStream` never returns until it times out. – Charlie Brown Aug 21 '13 at 21:20
  • I'm looking at the documentation for node-formidable and you can use `form.bytesExpected` to get the size of the file uploaded. Therefore, you can replace `11` with `form.bytesExpected` and this time it should work (fingers crossed). – danielepolencic Aug 21 '13 at 22:14
  • Thanks @danielepolencic. `form.bytesExpected` returns the size of all form data, so the method never returns. I have updated my question with more information based on your solution. The final piece of the puzzle seems to be determining the stream size to pass to Azure. When hardcoded, it works correctly but obviously we can't hardcode it. – Charlie Brown Aug 22 '13 at 14:38
  • I'm sorry but I don't get why `form.bytesExpected` is not right. It is supposed to return the size of the uploaded file. isn't what you want? – danielepolencic Aug 22 '13 at 15:18
  • It actually returns the size of the entire form including all fields and files combined. I have found the solution, I will update my question above. Please amend your solution with my fix for future ref so I can award you the points for leading me down the right path. – Charlie Brown Aug 22 '13 at 15:20
  • 1
    Scratch that, we should leave yours for others in the future, I will post the solution as an answer, but award you the bounty. – Charlie Brown Aug 22 '13 at 15:31
  • 1
    Thanks. Btw, pretty interesting question/solution. – danielepolencic Aug 22 '13 at 15:32
  • I'm getting an error about `Object # has no method 'pause'`; how can I fix this? I don't understand everything I'm doing well enough to fix it without a bit more help. EDIT: N/M, I used the above solution instead and things started working. Should have read a bit more first. – RonLugge Oct 28 '15 at 21:53
  • Hey guys do you if that can be used to upload to container/folder name .. i tried but not working – user666 Jul 21 '21 at 09:53
1

There are different options for uploading binary data (e.g. images) via Azure Storage SDK for Node, not using multipart.

Based on the Buffer and Stream definitions in Node and manipulating them, these could be handled using almost all the methods for BLOB upload: createWriteStreamToBlockBlob, createBlockBlobFromStream, createBlockBlobFromText.

References could be found here: Upload a binary data from request body to Azure BLOB storage in Node.js [restify]

ncinefra
  • 11
  • 1
  • 2
0

People having trouble with .createBlockBlobFromStream trying to implement the solutions, note that this method has been changed slightly in newer versions

Old version:

createBlockBlobFromStream(containerName, blobName, part, size, callback)

New version

createBlockBlobFromStream(containerName, blobName, part, size, options, callback)

(if you don't care about options, try an empty array) for the parameter.

Oddly enough, "options" is supposed to be optional, but for whatever reason, mine fails if I leave it out.

binderbound
  • 791
  • 1
  • 7
  • 27