0

I'm creating an app in Google Sheets using Google App Script. The app scrapes a web page and returns an HTML page to me as a string. I'm trying to insert it into a DOM so that I can use DOM queries to find the data I need within the page. Normally, you could do that by either creating a document fragment with document.createDocumentFragment(), or creating a detached element with document.createElement(), then setting the innerHTML with the HTML string. The problem is, GAS doesn't have a document context.

Does anyone have ideas for a workaround, or maybe some other way to query a large text string?

p.s. My very first instinct was to find the data I need as an API but I can't find anything that meets my needs. Screen scraping is my last resort.

UPDATE: Thank you for whoever found and tagged the similar question. Cheerio library for GS is the correct solution. https://stackoverflow.com/a/61928025/735374

T Nguyen
  • 3,309
  • 2
  • 31
  • 48
  • Google apps script runs on a server not on a browser so there is no DOM. You might try to use XMLService as long as your html is not malformed. OF course I have no idea what you are trying to accomplish but if I wish to use DOM queries to find something I prefer to use snippets and just capture the data in arrays and store in localStorage. I can easily take the data out in a standard javascript format and paste into a file as JSON or something easily readable by javascript. – Cooper Nov 23 '21 at 06:32
  • Can you explain what you mean when you say ` I'm trying to insert it into a DOM`? What DOM? Are you creating an [HtmlService object](https://developers.google.com/apps-script/reference/html/html-service)? – Rafa Guillermo Nov 23 '21 at 08:05
  • thanks for the suggestion @Cooper. I need to do this within GAS as the screen scraping is just one piece of a larger app, all done programmatically. Snippets are only useful as small-scale, manually run code. – T Nguyen Nov 23 '21 at 09:36
  • Hi @Ihopethisishelpfultoyou, I'm _hoping_ to create a detached DOM, then I can use methods like querySelector, etc to navigate an HTML document. I understand that this is server-side JS, which is basically what the question is about - if there is a way to create a DOM where one does not exist. – T Nguyen Nov 23 '21 at 09:38
  • Why not using just importxml and xpath ? – Mike Steelson Nov 23 '21 at 10:58
  • I actually tried that @MikeSteelson, but it turned out it was too much work to harangue HTML5 into valid XML. – T Nguyen Nov 24 '21 at 07:17

1 Answers1

0

You can not create a DOM in Google Apps Script, at least not in the server-side code.

In order to query the HTML string you would need to know exactly where in the page the data you need will be with identifiable substrings such as containing element IDs - but often times these are generated and so can't be known. HTML is also famously unable to be parsed by RegEx.

You could potentially do this as a Sheets add-on, if having the script run unattended is not required. In this case the workflow would be along the lines of:

  • Scrape the HTML data as a string
  • Create an HTMLOutput object using the createHtmlOutput(html) method
  • Open the HTML as a dialog or sidebar and do your querySelector calls inside the dialog. (In this case you will have to add these as code inside a <script> tag which runs on loading the page
  • Return the values to Google Apps Script using google.script.run
Rafa Guillermo
  • 14,474
  • 3
  • 18
  • 54