15

I am navigating a site using python's mechanize module and having trouble clicking on a javascript link for next page. I did a bit of reading and people suggested I need python-spidermonkey and DOMforms. I managed to get them installed by I am not sure of the syntax to actually click on the link.

I can identify the code on the page as:

<a href="javascript:__doPostBack('ctl00$MainContent$gvSearchResults','Page$2')">2</a>

Does anyone know how to click on it? or if perhaps there's another tool.

Thanks

Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • Wouldn't you just click on it normally? If python-spidermonkey and DOMForms are any good it would just work. – Peter C Mar 06 '11 at 01:14
  • I was trying to but its really hard to find examples that work. I am actually not sure how to do it. Most of the commands I found in the example didn't work either. I have read of alot of people suggesting these tools for their ability to handle javascript but using the packages is not very straight forward. – Lostsoul Mar 06 '11 at 01:19
  • If I need to deal with JavaScript, I avoid mechanize (or twill, which I prefer) and instead use something like [Selenium](http://seleniumhq.org) or [Splinter](http://splinter.cobrateam.info) (which is my favorite between the two). – brandizzi Dec 03 '11 at 04:20

3 Answers3

6

I mainly use HtmlUnit under jython for these use cases. Also I published a simple article on the subject: Web Scraping Ajax and Javascript sites.

sw.
  • 3,240
  • 2
  • 33
  • 43
  • Thanks, I am looking into this right now. To be honest, I am somewhat new to OO programing and still trying to figure out java. I was avoiding an all java solution because if things break I didn't know how well I could troubleshoot. I feel better with Python, but this solution looks really good, it seems like I can write python scripts and call java scripts to create variables to pass back and forth. Your site rocks and there seems to be a good chunk of documentation/samples of htmlunit. – Lostsoul Mar 08 '11 at 16:33
  • Thanks Lostsoul. I think the combination between languages like Python or Ruby (JRuby) with frameworks in Java gives a lot of power. Java has some of the more developed frameworks but they are many times complex to use in a direct way. – sw. Mar 08 '11 at 19:21
2

instead of struggling with python-spidermonkey try webkit's qt python bindings.

Here is a full example to execute JavaScript and extract the final HTML.

hoju
  • 28,392
  • 37
  • 134
  • 178
  • This looks very interesting. I just installed it and will play around with it. I found one sample script and not much documentation on using the webkit. – Lostsoul Mar 08 '11 at 16:30
  • 1
    added an example. Yeah unfortunately is hard to find many examples about it. Most people use Qt/Webkit via C++. – hoju Dec 03 '11 at 03:25
0

how about calling __doPostBack('ctl00$MainContent$gvSearchResults','Page$'+pageid); (javascript method, via python-spidermonkey)

Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169
n00b
  • 5,642
  • 2
  • 30
  • 48
  • Thank you very much for your quick reply n00b32. I'm very new to this spidermonkey and am still a little confused. How exactly would I do this? There isn't much documentation or sample scripts I could find for spidermonkey. I'm basically at the point where I have imported mechanize and beautifulsoup and have a variable (soup1) that fixes all the broken html in the page. I can get the link above in a variable but not sure what to do after that, I'm still very confused. It would be great to get an example or if you could direct me where to learn. Thanks again! – Lostsoul Mar 06 '11 at 07:48