1

I would like to load a web page every day (perhaps more times a day), parse its data and email results.

To parse the web page I am using a parse.gs script like the following:

var url="http://http://example.com";
var page = UrlFetchApp.fetch(url).getContentText();
var XmlDoc = Xml.parse(page, true);   

When I parse XmlDoc I have only getElement/s functions available and I find it difficult to do an effective job. So I would like to use something more productive, like the JQuery selectors.

As far as I understood I have to add a jquery.html page to the project like:

<html>
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.0.1/jquery.min.js"></script>
</html>

Then add to parse.gs the function:

function doGet() {
   return HtmlService.createHtmlOutputFromFile('jquery');
}

After calling doGet, how do I parse XmlDoc? Using lines like $('#content').html(XmlDoc); doesn't work.

antonio
  • 10,629
  • 13
  • 68
  • 136
  • You're not restricted to using `.getElement()` to navigate an XML structure - one you have *any element* you can reference its properties as you would any javascript object. You didn't ask for an alternative, but there you go. – Mogsdad May 31 '13 at 17:10
  • Well `getElementById` is not available, as far as I know. Anyway alternative to jquery is welcomed. – antonio May 31 '13 at 17:54
  • If you're looking for an item with a specific ID, you could do a regex search to extract what you want. – Fred May 31 '13 at 18:49
  • 1
    WRT `getElementById`, check out the utility in [this answer](http://stackoverflow.com/a/16702114/1677912). A practical example of its use is in [this answer](http://stackoverflow.com/a/16860598/1677912). – Mogsdad May 31 '13 at 19:47
  • @Frederic: if you want to get a div and it has many nested divs, with regex it is quite difficult because you have many and you have to count the opening div a closing div. – antonio May 31 '13 at 20:47

1 Answers1

1

As an alternative to jquery, try building on the capabilities of the Xml Service entirely in apps-script.

A previous answer introduced a utility function that locates elements in the XML Document by searching for properties matching the search criteria.

getElementByVal( body, 'input', 'value', 'Go' )  

... will find
  <input type="submit" name="btn" value="Go" id="btn" class="submit buttonGradient" />

It also showed one possible specialization, for searching 'id' attributes of <div>s:

getDivById( html, 'tagVal' )

... will find <div id="tagVal">

If you are able to identify an element uniquely, as in the above examples, a simple script can get you that element easily:

var url="http://http://example.com";
var page = UrlFetchApp.fetch(url).getContentText();
// Get <div id="content">
var contentDiv = getDivById( pageDoc.getElement().body, 'content' );
...
Community
  • 1
  • 1
Mogsdad
  • 44,709
  • 21
  • 151
  • 275