1

Take a look at any live auction on http://www.quibids.com/ I wanted to scrape Bid History which appears to be being updated by a javascript timer. When I inspect element in Chrome it auto-updates the source. Is there some way to do that with screen scraping? I'm using Ruby to do this if it matters. What I want to avoid is just hammering on that page every second.

Ben Wiseley
  • 537
  • 6
  • 14
  • 4
    Inspect the AJAX requests that your browser is making, and mimic them. – Richard H May 19 '11 at 17:51
  • If your intention to scrape this site is to "gain" an advantage over other bidders/bots I would be very careful. This site looks like a clone (/sister) of Swoopo. Such sites are a **barely legal** *scam* and as such I would avoid wasting your money on bidding: http://www.codinghorror.com/blog/2008/12/profitable-until-deemed-illegal.html – scunliffe May 19 '11 at 18:17
  • @dimitrov - sorry - i actually didn't know i had to go accept answers. i went back and accepted ones. – Ben Wiseley May 23 '11 at 22:42
  • @scunliffe - just collecting stats for someone whose doing a project. – Ben Wiseley May 23 '11 at 22:42
  • Possibly relevant: http://stackoverflow.com/questions/2073481/headless-scriptable-firefox-webkit-on-linux – Piskvor left the building May 27 '11 at 08:23

1 Answers1

1

You could use a browser engine that can execute the javascript, such as webkit (there's a scriptable wrapper for it, WebkitDriver).

Or check what the javascript timer is doing via a tool like firebug. Likely it is making an AJAX request to get the updated data and you can call these AJAX URL's directly.

Piskvor left the building
  • 91,498
  • 46
  • 177
  • 222
hoju
  • 28,392
  • 37
  • 134
  • 178
  • 1
    There's a fully scriptable headless browser named HTMLUnit, but it's in Java. – Piskvor left the building May 23 '11 at 13:26
  • webkit is also scriptable and can run headless – hoju May 27 '11 at 08:11
  • Not sure if we're using the same definition of "scriptable" - I'm not aware of a webkit-based browser that can be *fully programatically controlled by another program, without user intervention*; but then, I'm not claiming to know everything ;) Could you share a link to such project? – Piskvor left the building May 27 '11 at 08:17
  • You piqued my curiosity, and it turns out that you were right: [one of the answers to this question](http://stackoverflow.com/questions/2073481/headless-scriptable-firefox-webkit-on-linux) mentions [WebkitDriver](http://code.google.com/p/webkitdriver/). Could you edit that into the answer? I don't think it would warrant its own answer, but would fit into yours well. – Piskvor left the building May 27 '11 at 08:21
  • feel free to update my answer if you think it is a good solution for asker. – hoju May 28 '11 at 15:20