1

I am interested in understanding how to Promisfy this block of code:

const http = require('http');
const fs = require('fs');

const download = function(url, dest, cb) {
  let file = fs.createWriteStream(dest);
  const request = http.get(url, function(response) {
    response.pipe(file);
    file.on('finish', function() {
      file.close(cb);  // close() is async, call cb after close completes.
    });
  }).on('error', function(err) { // Handle errors
    fs.unlink(dest); // Delete the file async. (But we don't check the result)
    if (cb) cb(err.message);
  });
};

My first take on this was something to the extent of:

const http = require('http');
const fs = require('fs');

const download = async (url, dest, cb) => {
  let file = fs.createWriteStream(dest);
  const request = http.get(url, function(response) {
    response.pipe(file);
    file.on('finish', function() {
      const closed = await file.close(cb);  // close() is async, await here?
      if (closed) {
          // handle cleanup and retval
      }
    });
  }).on('error', function(err) { // Handle errors
    const deleted = await fs.unlink(dest); // Delete the file async. 
    if (!deleted) { ... }
  });
};

The implementation above is clearly wrong. What is the right away to approach this to remove callbacks and just use async/await?

randombits
  • 47,058
  • 76
  • 251
  • 433
  • 1
    Two hints. First, in the newest versions of node.js, use the `fs.promises` interface to the `fs` module. That's where you get promise support for the `fs` module. The regular functions like `fs.unlink()` do not support promises at all so using `await` with them does nothing useful. Second, streaming has no built-in promise support. If you want to promisify something that uses streaming, you have to do it manually by putting it inside a `new Promise(...)` and manually calling `resolve()` and `reject()` when the streaming is done or has an error. – jfriend00 Oct 21 '19 at 17:45
  • @jfriend00 `streaming has no built-in promise support` -- not entirely true, see [`Symbol.asyncIterator`](https://nodejs.org/api/stream.html#stream_readable_symbol_asynciterator) on the readable stream interface. – Patrick Roberts Oct 21 '19 at 17:53
  • @PatrickRoberts - OK, that's newish I guess (at least new to me). Can you post an answer that shows best how to use it in this context? – jfriend00 Oct 21 '19 at 17:55
  • &jfriend00 I can but not right now. I won't be offended if you wish to FGITW on me – Patrick Roberts Oct 21 '19 at 17:57
  • @PatrickRoberts - Honestly, I haven't grokked asyncIterators yet or understand how you'd use them to know when a `.pipe()` was done which is what this needs to know I think. – jfriend00 Oct 21 '19 at 17:58
  • Have a look at [How to return a promise when writestream finishes?](https://stackoverflow.com/q/39880832/1048572) – Bergi Oct 21 '19 at 20:19
  • What's the protocol when there are two great and correct answers? I hate having to award it to only one. – randombits Oct 22 '19 at 15:42

2 Answers2

3

Here's a way to manually wrap the pipe operation in a promise. Unfortunately, most of this is just error handling to cover all the possible places an error can occur:

const http = require('http');
const fs = require('fs');

const download = function(url, dest) {
  return new Promise((resolve, reject) => {
      const file = fs.createWriteStream(dest);

      // centralize error cleanup function
      function cleanup(err) {
          reject(err);
          // cleanup partial results when aborting with an error
          file.on('close', () => {
            fs.unlink(dest);
          });
          file.end();
      }

      file.on('error', cleanup).on('finish', resolve);

      const request = http.get(url, function(response) {
        if (response.status < 200 || response.status >= 300) {
            cleanup(new Error(`Unexpected Request Status Code: ${response.status}`);
            return;
        }
        response.pipe(file);
        response.on('error', cleanup);

      }).on('error', cleanup);
  });
};

download(someURL, someDest).then(() => {
  console.log("operation complete");
}).catch(err => {
  console.log(err);
});

This does not wait for files to be closed or removed in the error conditions before rejecting (figuring that there's typically nothing constructive to do if those cleanup operations have errors anyway). If that is desired, it could be added easily by just calling reject(err) from the asynchronous callbacks for those cleanup operations or by using the fs.promises version of those functions and awaiting them.


A few things to note. This is mostly error handling because there are three possible places you can have errors and some cleanup work needed for some errors.

  1. Added required error handling.

  2. In the OP's original code, they called file.close(), but file is a stream and there is no .close() method on a writeStream. You call .end() to close the write stream.

  3. You also probably need to check for an appropriate response.status because http.get() still returns a response object and stream even if the status is something like 4xx or 5xx.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • 1
    @Bergi [nope](https://nodejs.org/api/http.html#http_http_get_url_options_callback), one of the few exceptions to that rule – Patrick Roberts Oct 21 '19 at 22:46
  • 1
    Btw, I'd prefer to declare `file` inside the `http.get()` callback so that I don't have to `unlink()` in the `error` handler of the request object returned by `http.get()`. – Patrick Roberts Oct 21 '19 at 22:48
  • @PatrickRoberts - Do you mean call `file.createWriteStream()` inside the request handler so if the request fails immediately, then you don't have to cleanup the file? – jfriend00 Oct 21 '19 at 22:51
  • @PatrickRoberts - Do either of you know if you need error handlers on both the read stream and write stream when piping? Or, is there one place that will catch either type of error? – jfriend00 Oct 21 '19 at 22:53
  • 2
    @Bergi - OK, I've simplified the error handling by centralizing it in a single cleanup function. I know you need the `http.get().on('error' ...)` because certain failures (like a bad URL) can fail immediately there. I know you need an error hander on the writestream because you can get things like a full disk or a disk write error. I presumed you need to be able to detect a request that started flowing, but the connection dropped and thus got an error. – jfriend00 Oct 21 '19 at 23:01
  • @randombits - Did this answer your question? If so, you can indicate that to the community here by clicking the checkmark to the left of the answer. That will also earn you some reputation points here on stackoverflow for following the proper procedure. If this didn't answer your question, please comment about which part of your question you're still confused about. – jfriend00 Nov 25 '19 at 18:18
3

Here's how I'd re-write your node-style callback API as an asynchronous function:

const http = require('http');
const fs = require('fs');

async function download (url, dest) {
  const response = await new Promise((resolve, reject) => {
    http.get(url, resolve).once('error', reject);
  });

  if (response.status < 200 || response.status >= 300) {
    throw new Error(`${responses.status} ${http.STATUS_CODES[response.status]}`);
  }

  const file = await fs.promises.open(dest, 'w');

  try {
    for await (const data of response) {
      await file.write(data);
    }
  } catch (error) {
    await file.close();
    await fs.promises.unlink(dest);
    throw error;
  }

  await file.close();
}

Note that this approach uses the FileHandle class in the fs.promises namespace, as well as the Symbol.asyncIterator interface defined on the Readable stream class, which allows you to consume the data events of the response with a for await...of loop and propagate error handling from the error event of the response to the catch block by implicitly rejecting the promise returned by the underlying asynchronous iterator.

Patrick Roberts
  • 49,224
  • 10
  • 102
  • 153
  • This is cool. I think you need to check for `response.status === 200` since this is the raw `http.get()` that doesn't check that for you. – jfriend00 Oct 22 '19 at 00:02
  • And, I think there's a hidden gotcha (that I'm not sure my answer handles properly yet) where if you bail before the code has finished reading all of the `http.get()` response, it can leak memory. I've seen something implied to that effect in [examples in the doc](https://nodejs.org/api/http.html#http_http_get_url_options_callback). It seems like some higher level construct is missing here from nodejs because this type of code is 5% function and 95% error handling. I guess you could say that streams just aren't really fully promisified for simplicity. – jfriend00 Oct 22 '19 at 00:06
  • @jfriend00 bail in what way exactly? Also I can make a note that the response status isn't handled here but that may not even be a user requirement and I don't like to presume especially if it means complicating the function further. – Patrick Roberts Oct 22 '19 at 00:09
  • Say you get a write error on `file.write(data)`. The incoming `http.get()` response may not be fully consumed. I added a link to my previous comment where I saw this in the doc. – jfriend00 Oct 22 '19 at 00:10
  • @jfriend00 ah, `for await...of` implicitly calls the `throw` or `return` methods on the async iterator object if the loop bails out early. Those _should_ already implement the appropriate cleanup methods on the response object, but I can double check in the source if you like. – Patrick Roberts Oct 22 '19 at 00:11
  • FYI, It's highly unlikely that any implementation wants the caller not to know the difference between a 200 response and some error response. I think it's a safe presumption that the caller nearly always needs to know that. – jfriend00 Oct 22 '19 at 00:12
  • Hmmm, OK that would be handy if the asyncIterator handles it for you. More of this new stuff I don't know yet. – jfriend00 Oct 22 '19 at 00:13
  • @jfriend00 sure but [`200` isn't the only "non-error" response](https://httpstatuses.com/), and again, it wasn't stated as a user requirement so I'd prefer to leave the basic functionality as readable as possible until I hear from OP that they'd like that. I'll still make a comment though regarding that. – Patrick Roberts Oct 22 '19 at 00:16
  • Sure, you need to know your target host and what you are expecting from it in normal operating conditions. Maybe you want to test for any 2xx response. But, it's unlikely the caller wants no indication that it was a 2xx vs. a 4xx or 5xx. FYI, I upvoted you, just pointing out other things to consider. – jfriend00 Oct 22 '19 at 00:24
  • @jfriend00 understood. As for the error handling, [`for await...of`](https://www.ecma-international.org/ecma-262/9.0/#sec-runtime-semantics-forin-div-ofbodyevaluation-lhs-stmt-iterator-lhskind-labelset) performs the abstract operation [`AsyncIteratorClose()`](https://www.ecma-international.org/ecma-262/9.0/#sec-asynciteratorclose) when the loop ends early (even abruptly), which calls the `return()` method on the async iterator if it exists, [which it does](https://github.com/nodejs/node/blob/9f873b3a659e82eb232785c9e7cfec6df8dd5277/lib/internal/streams/async_iterator.js#L112-L134). – Patrick Roberts Oct 22 '19 at 00:32
  • OK, cool. Thx. It looks like it calls destroy on the stream. – jfriend00 Oct 22 '19 at 00:35