0

Not all of the view-source:http://www.portofhueneme.org/home.php from the site is retreived from UrlFetchApp.fetch().getContentText.

I heard UrlFetchApp is just a wrapper for python's urllib2 module. A previous post mentioned that urllib2 does not fetch context which is dynamically generated from script, but I can't find any scripts which would generate the rest of the page.

I'm trying to get the date listed under 'important announcements.'

function test_date() {
  var url = UrlFetchApp.fetch('http://www.portofhueneme.org/home.php') ;
  var text= hueneme_url.getContentText() ;
  Logger.log(hueneme_text) ;

  var pattern = /Current Vessel Schedule/

  var start =  hueneme_text.search(pattern) ;
  Logger.log("\n"+start) ;

}

Community
  • 1
  • 1
user1469051
  • 57
  • 1
  • 9

1 Answers1

0

There is no connection between UrlFetchApp and urllib2. (Perhaps what you heard was about the equivalent UrlFetch API on App Engine, although I have no idea; but it's definitely not true of Apps Script.) However, in general none of the UrlFetchApp-like libraries in any language or platform will execute scripts in the page (even JavaScript's own XmlHttpRequest doesn't do that!) so the observation is still relevant.

In this case your problem is that the text doesn't contain /Current Vessel Schedule/ because if you look at the source for that page you'll see that there is not just one space between the words, but lots of whitespace including a newline. You don't see that in the visible page, but it's in the HTML code, which is what you have from UrlFetchApp.

To make this work you need to change your script to /Current\s*Vessel\s*Schedule/ instead. Here's the full example:

function test_date() {
  var url = UrlFetchApp.fetch('http://www.portofhueneme.org/home.php') ;
  var text = url.getContentText() ;  
  var pattern = /Current \s*Vessel\s*Schedule/
  var start =  text.search(pattern) ;
  Logger.log(start) ;
}
Corey G
  • 7,754
  • 1
  • 28
  • 28