1

I am writing a image scraper using Pycurl by sending forged requests which is the same with the results by the http analyzer to the website server. Using the http analyzer

This site requires several steps of interaction to finally response with image contents. First I have to open the link by pycurl and get the gzip format response which including the html content. The request for image is then send by the site's javascript code.The server generated the image by a dll according to the reqeust.

I can already get the images by identifying the response content. However I found it very trivial that I have to change my code every time the website change the querying steps so I want to interact with this website by PyQt4.WebKit as a browser.

How to extract the specific image content in PyQt4.WebKit?

Treper
  • 3,539
  • 2
  • 26
  • 48
  • What type of interaction does this site need? What images do you need to get? The ones that are linked on the web page or a rendering of the page itself? – Devin M Aug 20 '11 at 04:39
  • So you want to render an image of the entire page? – Devin M Aug 20 '11 at 04:50
  • Maybe I misunderstood. I just want to download the images on the web page. – Treper Aug 20 '11 at 04:53
  • Does this answer work for you? http://stackoverflow.com/questions/257409/download-image-file-from-the-html-page-source-using-python/258511#258511 – Devin M Aug 20 '11 at 04:56
  • No.Because the website requires several steps:you open the url, it response with the image url. you can't access the image directly because it is generated by a dll in the website server.So I think it can only handled by a humman-like browser. – Treper Aug 20 '11 at 04:59
  • @DevinM let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/2682/discussion-between-darwin-and-devin-m) – Treper Aug 20 '11 at 05:03
  • @Darwin let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/2683/discussion-between-devin-m-and-darwin) – Devin M Aug 20 '11 at 05:27

0 Answers0