0

I'm using the Fetch API to get some (public) data from a webpage. Here is my code:

const proxyurl = "https://cors-anywhere.herokuapp.com/";
var bookUrl = "the+old+man+and+the+sea";
var url = 'https://miss.ent.sirsidynix.net/client/en_US/mlsathome/search/results?qu=' + 
bookUrl + '&te=ILS';

fetch(proxyurl + url).then(function(result){
    return result.text();
}).then(function(text){
    var parser = new DOMParser();
    var html = parser.parseFromString(text, 'text/html');
    var divs = html.getElementsByTagName('div');
    var x = html.getElementsByTagName('thead');
    console.log(divs);
    console.log(x);
});

The output in the console for divs is "HTMLCollection(80)" so I know the page has loaded, but the output for x is "HTMLCollection(0)" meaning that it found no thead elements. However, when I look at the source code for the page, I can see multiple thead tags. whenever I load the page, the tables do take a little longer to load. How do I get the Fetch API to get the entire page?

Manny
  • 1
  • 2
  • 1
    what you're getting in that fetch is the page **before the javscript has run** - there are no `thead` elements in the html until javascript on that page runs and puts them there – Jaromanda X Jun 13 '18 at 22:41
  • I still can't access the tables the website has generated, and it's not like the tables are hidden or anything. [This](https://miss.ent.sirsidynix.net/client/en_US/mlsathome/search/results?q=the+old+man+and+the+sea&x=49&y=11) is the link to the webpage if you'd like to see. – Manny Jun 13 '18 at 22:45
  • Might not be the only issue but aren’t you trying to fetch invalid URL ? Both proxyurl and url are URL addresses and you’re adding them together which resolve into invalid URL – user3210641 Jun 13 '18 at 22:46
  • the URL is valid, since I can see the rest of the web pages HTML. The Proxy URL is just to circumvent the CORS problem. – Manny Jun 13 '18 at 22:50
  • 1
    `I still can't access the tables the website has generated` of course not, because you are getting the HTML, as I said. What you get using fetch is the raw page, the javascript in that page does not run, therefore no tables – Jaromanda X Jun 13 '18 at 22:50
  • console.log the raw html - i.e. the `text` argument - you will see that I'm right – Jaromanda X Jun 13 '18 at 22:51
  • Yup, you're right. How can I get the javascript? Is there a way to wait for the entire page to load? – Manny Jun 13 '18 at 22:53

0 Answers0