1

Using this webpage as an example http://forums.macrumors.com/showthread.php?t=1688317 On a google spreadsheet, the following DO NOT work with importxml():

//a[contains(@href,"showpost")]/@href
//a[contains(@href,"showcount")]/@href
//*[@id="postcount18545482"] 

The last one (//*[@id="postcount18545482"]) was copied directly from Chrome's element viewer.

The following DO work but exclude any results with the word "showcount", "postcount", or "showpost":

//div[contains(@id,"post_message")]/@id
//a[contains(@href,"show")]/@href
//a[contains(@href,"post")]/@href

Is there something special about the word "count" when working with importxml() or XPATH? How can I get the missing entries?

Rubén
  • 34,714
  • 9
  • 70
  • 166

1 Answers1

1

ImportXML function in Google Docs spreadsheet can not process data that is created in a two-step process. For example, when an authentication token must be retrieved first before making the url request, or when the URL tells the server to dynamically create an xml output after which the user is redirected to the output, even when the URL stays the same. You might want to look into Google Apps Scripts (http://code.google.com/googleapps/appsscript/index.html) to handle this case.

Taken from here

In your particular case the anchor parameters get set in the vbulletin_post_loader.js script called after the page container is loaded.

...
pc_obj=fetch_object("postcount"+this.postid);
openWindow("showpost.php?"+(SESSIONURL?"s="+SESSIONURL:"")
+(pc_obj!=null?"&postcount="+PHP.urlencode(pc_obj.name):"")+"&p="+A)
...

In other words, when importXML() scans the page, the nodes containing 'showpost' or 'postcount' in href are not yet on the page:

Looks like importXML() works with static pages only and not able to handle dynamically loaded content.

Try to find another way of obtaining the number of post in a thread.

Community
  • 1
  • 1
Vlad.Bachurin
  • 1,340
  • 1
  • 14
  • 22