1

I have the below function trying to convert a webpage to xml, so I can start extracting out some data from tables etc.

function getWebpageContent() {

  var url = "https://training.gov.au/Training/Details/BSBCRT501";
  var xml = UrlFetchApp.fetch(url).getContentText();
  var document = XmlService.parse(xml);
  Logger.log(document);

}

I'm recieving this error:

Exception: Error on line 170: The entity name must immediately follow the '&' in the entity reference.
getWebpageContent   @ Code.gs:6

When I search that webpage for the "&" symbol, (assuming that XmlService is confusing the 'and' symbol for some sort of html code and throwing an error) I can only find one hidden one. And am not sure how to circumvent it.

Any way to dodge that error and get the webpage info as Xml in Apps Script?

JackNapier
  • 318
  • 4
  • 14
  • When I saw the HTML of your URL, it seems that the HTML is not XML data. In this case, the HTML data cannot be directly parsed with `XmlService.parse`. I think that by this, such error occurs. But, unfortunately, from your question, I cannot understand about your goal. Can I ask you about the output you expect? Because from `Any way to dodge that error and get the webpage info as Xml in Apps Script?`, I cannot understand about the output values you expect. I apologize for this. From your additional information, I would like to think of the workaround for achieving your goal. – Tanaike Feb 26 '21 at 05:45
  • Thanks @Tanaike - my goal is to return some of the table

    elements to display in a spreadsheet (or document). I've actually never used xmlservice but it sounded like the easiest way to retrieve elements. I can grab the html of the page as a string but unsure then how to retrieve individual elements. If it wouldn't work as xml then any advice would be great.

    – JackNapier Feb 26 '21 at 09:35
  • To clarify further: there are many of those urls with different info in the tables. I want to be able to punch in the url, and retrieve the table data, displaying it in a spreadsheet. I can use a google sheets formula but it's not perfect, so thought apps script might work better. – JackNapier Feb 26 '21 at 09:37
  • Thank you for replying. I have to apologize for my poor English skill. Unfortunately, from your replying, I cannot still understand about the output you expect. Can I ask you about the detail of it? – Tanaike Feb 26 '21 at 12:18
  • The output I want from this page (https://training.gov.au/Training/Details/BSBCRT501) is each

    in the 'Elements and Performance Criteria' table. I want to save it as an array to then reformat into my spreadsheet. I might just use IMPORTXML in a spreadsheet formula instead.

    – JackNapier Feb 26 '21 at 23:50
  • Thank you for replying. From your replying, I proposed an answer. Could you please confirm it? If that was not the result you expect, I apologize again. – Tanaike Feb 27 '21 at 00:07

2 Answers2

1

This guy here explains the underlying cause, and also gives the solution:

The entity name must immediately follow the '&' in the entity reference

Sebastian
  • 131
  • 7
1

From your following replying,

The output I want from this page (https://training.gov.au/Training/Details/BSBCRT501) is each

in the 'Elements and Performance Criteria' table. I want to save it as an array to then reformat into my spreadsheet. I might just use IMPORTXML in a spreadsheet formula instead.

In this case, how about the following formula?

Sample formula:

=IMPORTXML("https://training.gov.au/Training/Details/BSBCRT501","//table[2]//tr")

Result:

enter image description here

Reference:

Added:

From your following replying,

That's a great answer thanks and the method I think I'll use. It doesn't link break at each point (2.1, 2.2 etc) unfortunately but it's still good. I don't think I can accept it as the answer though as it doesn't solve the specific Apps Script problem, but thanks a lot for this.

I added a sample script for using Google Apps Script. Could you please confirm it?

Sample script:

Before you use this script, please enable Sheets API at Advanced Google services. When you run this script, the table is put to the active sheet.

function myFunction() {
  const url = "https://training.gov.au/Training/Details/BSBCRT501";
  const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
  if (res.getResponseCode() != 200) throw new Error(res.getContentText());
  const table = [...res.getContentText().matchAll(/<TABLE[\s\S\w]+?<\/TABLE>/g)];
  if (table && table[1][0]) {
    const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
    const sheet = spreadsheet.getActiveSheet();
    const resource = {requests: [{pasteData: {html: true, data: table[1][0], coordinate: {sheetId: sheet.getSheetId()}}}]};
    Sheets.Spreadsheets.batchUpdate(resource, spreadsheet.getId());
  }
}

References:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • That's a great answer thanks and the method I think I'll use. It doesn't link break at each point (2.1, 2.2 etc) unfortunately but it's still good. I don't think I can accept it as the answer though as it doesn't solve the specific Apps Script problem, but thanks a lot for this. – JackNapier Feb 27 '21 at 04:57
  • @JackNapier Thank you for replying. I deeply apologize for my poor English skill. From `I might just use IMPORTXML in a spreadsheet formula instead.`, I had thought that in your goal, you can use `IMPORTXML` instead of Google Apps Script. So I proposed above answer. But from your replying, I knew that you wanted the sample script of Google Apps Script. So I added a sample script for using Google Apps Script. Could you please confirm it? If that was also not the direction you expect, I apologize again. – Tanaike Feb 27 '21 at 05:15
  • 1
    Thank you so much for this! Your english is fantastic! – JackNapier Feb 27 '21 at 23:42