I am using jsdom in order to parse results from google shopping. The following code takes a google shopping link, parses it, and extracts the table that contains all of the results:
const jsdom = require('jsdom').JSDOM;
function parseSite() {
const url = "https://www.google.com/shopping/product/8352592323560827089/online";
let trimmedTable = "";
jsdom.fromURL(url).then(function (dom) {
let innerHtml = dom.window.document.querySelector('html').innerHTML;
let tableStartIndex = innerHtml.search("<tbody><tr ");
let nonTrimmedTable = innerHtml.substr(tableStartIndex + 7, innerHtml.length);
let tableEndIndex = nonTrimmedTable.search("</tbody></table>");
trimmedTable = nonTrimmedTable.substr(0, tableEndIndex);
});
}
parseSite();
I realize that promises are asynchronous and it seems like I am trying to use it in a synchronous manner, but jsdom was the only thing I could find that loads the entire webpage as if it were a web browser. I do not want to use selenium because performance would take a hit. The code itself works exactly how I want it to, I just need to get the result of trimmedTable
outside of the promise.
My question: Is there something better out there than jsdom for loading and extracting data from web pages as if they were being loaded in the browser? (something that can accomplish what I am trying to do in the provided code) If not, how can I write my code so that I can get the result of trimmedTable
assigned to a variable outside of the promise?