I am trying to write a Python script to crawl a rock climbing rankings website, and the website is using a combination of redirects and frames which confuses every attempt I've made at accessing the data at the URL. I've tried a few different crawler scripts, as well as curl on the command line, and none of them gets anything more than an empty document.
For reference, an example of the types of URL I am attempting to access is something like this:
http://www.8a.nu/Scorecard/AscentList.aspx?UserId=1476&AscentType=0&AscentClass=0&AscentListTimeInterval=1&AscentListViewType=0&GID=ea0fb3b90e4b0b655580384e07974b38
Which redirects to this URL:
http://www.8a.nu/?IncPage=http%3A//www.8a.nu/Scorecard/AscentList.aspx%3FUserId%3D1476%26AscentType%3D0%26AscentClass%3D0%26AscentListTimeInterval%3D1%26AscentListViewType%3D0%26GID%3Dea0fb3b90e4b0b655580384e07974b38
Which is, itself, a page containing several frames. Extra-confusingly, the author uses javascript to redirect to the main frame again if you try to view the frame by itself.
It seems as if the web server is refusing to serve any data for the contents of the frame, unless it is actually enclosed in that frame. This is making it extremely difficult to programatically access the contents of the frame. And advice on how I can get at the contents of this frame would be hugely appreciated. At a deeper, more conceptual level, how the heck does the website know to refuse to serve the document when it's not in a frame?