0

I am trying to download all images from a website, however I can only get a return of one image per page/item. I am trying to get my spider to download all the images present on the page.

        for elem in response.xpath("//img"):
        img_url = elem.xpath("@src").extract_first()
        l.add_value('image_urls', [img_url])
        l.add_value('url', response.url)
        l.add_value('project', self.settings.get('BOT_NAME'))
        l.add_value('spider', self.name)
        l.add_value('server', socket.gethostname())
        l.add_value('date', datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
        return l.load_item()

When I change .extract_first() to .extract() the spider stops to run, however I cannot work out how to pass each image url (there can be dozens on a page) to be its own item + download.

Any help would be greatly appreciated!

Tomzski
  • 9
  • 4

1 Answers1

0

You are only getting one image because return exits your method immediately. Use yield instead of return for the desired behavior.

See this other answer for details.

Gallaecio
  • 3,620
  • 2
  • 25
  • 64
  • Thank you very much for the help, has fixed my problem and have learned something new in the process for the future. – Tomzski Mar 22 '19 at 16:45