0

This wont be easy to answer so I would like some guide instead: I want to download images form a webserver. I know how to get an image from a url, but i dont know how many pages I can find. (example, chapter 01 has 21 images, chapter 02 only 12...)

There is a combo box (DropdownChoice) on the webpage that tells how many pages are of that chapter. Is there a way I can get that info?

if I get to know that, i can do a for from page 1 to page x and download every image.

fyi I am using python

Thanks!

Illiax
  • 1,002
  • 1
  • 8
  • 21
  • something like this? http://stackoverflow.com/questions/5974595/download-all-the-linksrelated-documents-on-a-webpage-using-python/5976423#5976423 – Rusty Rob Aug 28 '12 at 07:59

2 Answers2

2

As a quick hack, you could just download sequential pages until you get a 404 (or some other error). This isn't generally considered "nice", so use it with caution, but it will allow you to download all of the images easily.

Alternately, you can look at using the Scrapy package to help you download and parse webpages and images.

nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • +1. Although I'd say that using a ``try: ... except HTTPError: ...`` block is a perfectly valid way to approach this (possibly checking the type of HTTPError in the except block and re-raising if necessary). Doesn't seem like a hack to me if you make the except statement sufficiently specific (``except IOError`` for example would definitely be bad) – Moritz Aug 28 '12 at 07:30
  • Well, it's a hack on the server, not on the client. That is, downloading pages until you hit a 404 is (at least to me) not good form. But hey, it works, and if it's a once-off sort of project then I'm perfectly OK with that. – nneonneo Aug 28 '12 at 07:32
1

This project in python downloads a number of images from a website pre-defined by the user. I'm sure you can change it to suit your needs.

Adriano_Pinaffo
  • 1,429
  • 4
  • 23
  • 46