0

I am using the DOMParser() to create a new Document Object Model from a remote webpage so that I can make an in-browser web scraper. This is the code that I have to retrieve the remote webpage.

function getDOM(url){
  // Variables
      let dom = new Document;
      let xhr = new XMLHttpRequest();
      let cors = "https://cors-anywhere.herokuapp.com/"

  // Create and send the XMLHttpRequest
      xhr.open('GET', cors + url, true);

      // xhr.responseType = 'document'; // This was necessary but then i changed 
      // the way i was accomplishing things and it isnt necessary anymore, I do not 
      // know why.

  // This loads the xhr request into a new DOMParser
      xhr.onreadystatechange = function () {
          if (xhr.readyState === 4) {
              dom = new DOMParser().parseFromString(xhr.responseText, 'text/html')

              // This line of code is capable of printing the value of the "productTitle"
              // element to the console, but if i try this same line of code outside of 
              // this if statement it returns an error saying "cannot read 
              // properties of null" 
              console.log(dom.getElementById('productTitle').innerText);
          }
      }

  // This sends the request to the specified url
      xhr.send(null)

  //Returns the new DOMParser to the product class 
      return dom;
}

The issue is that i cannot manipulate or access any of the contents of the "dom" object outside of the if statement that the "dom = new DOMParser().parseFromString(xhr.responseText, 'text/html')" exists in. Shouldn't the parsed document be assigned to the "dom" element and accessible in the rest of the function? I am accessing the data successfully in the if statement through the "dom" object that I have assigned the results of the DOMParser() to.

Edit: Though this question Link to another question that is similar is indeed trying to solve the same technical problem that I am having, the solutions given use AJAX, jQuery, and other technologies that I am not using. I am only using plain javaScript and therefor need a solution that is not given in that question.

  • 1
    What is the point of `let dom = new Document;`? – kiner_shah Feb 05 '22 at 05:49
  • kiner_shah - I was getting "undefined" errors in the methods of the class that calls this function. Adding `let dom = new Document;` solved that issue specifically, but seeing as I still have this issue that may very well be unnecessary. I am still very new to javaScript. – Thomas the plane Feb 05 '22 at 05:59
  • 1
    Maybe you should post a [mre] to show the class containing the methods and the errors you got. – kiner_shah Feb 05 '22 at 06:07
  • tkausl - That answer does explain why i am having this issue, but i am only using pure javascript, not ajax or any of the other things they are talking about using in that answer. So my solution was not in the answers to that question. The solution that i found to my problem was to have the class creation happen in the `xhr.onload()` function. That way the constructor of the class had all of the data necessary on hand when the constructor was called. I will add an answer to this post once i have tidied things up. – Thomas the plane Feb 05 '22 at 20:40
  • "*i cannot manipulate or access any of the contents of the "dom" object outside of the if statement*" - it's not the `if` statement that constrains you, it's the **asynchronous** `onreadystatechange` callback. "*I am only using plain javaScript and therefor need a solution that is not given in that question.*" - the linked question does have [answers like this](https://stackoverflow.com/a/16825593/1048572) using XHR (ajax) with promises and callbacks, which are the plain technologies you are using. – Bergi Feb 05 '22 at 20:57

0 Answers0