0

I have some code which uses the new ES6 fetch API and nests the fetch calls. The first fetch gets a document, which contains some anchors. And for each anchor I do another fetch. This is the code.

  1. I do a fetch, take the response text, parse it as HTML and pass the DOM to another function.

  2. I select some anchors from the document and for each I check, if it is suitable and then I do the first logging. This logs six different URLs.

  3. Now I do for each of the six URLs a nested fetch. The code is almost the same as for the first fetch. I just log the response URL to make sure, that I have different responses. This second logging shows also 6 different URLs. Once again I parse the response text and pass the DOM to another function.

  4. And now it gets funny. The first thing I do with my DOM is printing the document URL. And this prints six times the same URL. Why does it print six times the same URL although four lines ago six different URLs have been logged?

There must be some kind of side effect, but I do not understand where it is. I have different responses and I create for each response a new DOM parser. How can there be a side effect?

// 1.

fetch('https://eu.battle.net/d3/de/item/two-handed/')
  .then(response => response.text())
  .then(text => (new DOMParser()).parseFromString (text, 'text/html'))
  .then(doc => {

    // 2.

    let urls = [];
    doc.querySelectorAll('li.open a')
      .forEach(node => {
        if (!node.href.match(/flail|mighty-weapon|scythe/g)) {
          console.log ("1: " + node.href); // Prints 6 different URLs.

          // 3.

          fetch(node.href)
            .then(response => {
              console.log ("2: " + response.url); // Prints 6 different URLs.
              return response.text()
            })
            .then(text => (new DOMParser()).parseFromString (text, 'text/html'))
            .then(doc => {

              // 4.

              console.log ("3: " + doc.URL); // Prints first URL six times.
              doc.querySelectorAll('.top .ui-pagination a')
                .forEach(node => {
                  urls.push(node.href);
                });
            });
        }
      });
    return urls;
  });
ceving
  • 21,900
  • 13
  • 104
  • 178
  • Maybe `response.text()` returns the same thing for every request? – Felix Kling Nov 02 '17 at 22:43
  • What is the value of `doc.URL` given that the document is created via `(new DOMParser()).parseFromString (text, 'text/html')`? – Felix Kling Nov 02 '17 at 22:51
  • 2
    So `doc.URL` prints the url of wherever this javascript is running, not necessarily the url of the source from which the document was constructed. So I think it's printing the url of the page you're running this on, rather than the source of each document. That makes sense, as since you only give the html to the DOMParser, how could it know from which source that html was collected from? – CRice Nov 02 '17 at 22:52
  • 1
    When I do `(new DOMParser()).parseFromString('
    ', 'text/html').URL` I always get the same value, no matter the value I pass in. So it seems the result is expected? `node.href` and `doc.URL` are simply not related to each other.
    – Felix Kling Nov 02 '17 at 22:53
  • @CRice This seems to be the problem. `parseFromString` does not know the URL. And the later `node.href` consequently returns the wrong URL, because in this specific document it is a relative URL. This leads to the question how to tell DOMParser about the correct URL. – ceving Nov 02 '17 at 23:02
  • I think I found the description. [When a DOMParser is instantiated by calling new DOMParser(), it inherits the calling code's principal (except that for chrome callers the principal is set to the null principal) and the documentURI and baseURI of the window the constructor came from.](https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIDOMParser) – ceving Nov 02 '17 at 23:09
  • The doc you linked says you might be able to pass in the base url when you construct the parser. I tried that but didn't have any luck in my experiments. Barring that, you can use `node.getAttribute("href")` instead of `node.href`, which will just give you the path (without the origin), then you can append that to the correct origin manually. – CRice Nov 02 '17 at 23:18
  • @ceving How do you run the code at Question without CORS issue? – guest271314 Nov 02 '17 at 23:28
  • Why do you use `doc` as an identifier twice at twice different locations within the code? – guest271314 Nov 02 '17 at 23:35
  • @guest271314 from the battle.net site it works without CORS error. I had in my first version of the question an example, which did not work, because of the CORS error, which prevents access to battle.net from stackoverflow. After I realized it I have removed the snippet. – ceving Nov 02 '17 at 23:35
  • @ceving No, the code does not run from battle.net because battle.net redirects to https://www.blizzard.com/en-us/ – guest271314 Nov 02 '17 at 23:37
  • @guest271314 The code uses 7 different DOMs. No chance and no need to give them all different names. – ceving Nov 02 '17 at 23:37
  • You are using two different identifiers to reference two different `doc` variables at `.then(doc => { // 2.`, and `.then(doc => { // 4.` which could also be the issue. In any event, not able to reproduce the issue that you describe at OP, see https://stackoverflow.com/help/mcve – guest271314 Nov 02 '17 at 23:38
  • @guest271314 The redirect might occur, because you access the site from US. I access from EU and so I do not have a redirect. – ceving Nov 02 '17 at 23:40
  • Try changing one of the `doc` identifiers to a different identifier, e.g., `_doc`. The latter `doc` is still within the scope of the former `doc` identifier – guest271314 Nov 02 '17 at 23:41
  • The problem has been identified. CRice and Felix gave the right answer. – ceving Nov 02 '17 at 23:42
  • @ceving Note that `urls` will still not be what you are expecting when returned from `.then()` – guest271314 Nov 02 '17 at 23:43

1 Answers1

-1

Are nested promises normal in node.js? I strongly recomment you use async-await. Thanks to async / await, the asynchronous code becomes similar to synchronous, and in its behavior there are features of such code, very useful in some situations in which promises were used, for various reasons, inconvenient.

user8685433
  • 370
  • 2
  • 9