Download multiple items through imagepipeline

Question

I am trying to download all images from a website, however I can only get a return of one image per page/item. I am trying to get my spider to download all the images present on the page.

        for elem in response.xpath("//img"):
        img_url = elem.xpath("@src").extract_first()
        l.add_value('image_urls', [img_url])
        l.add_value('url', response.url)
        l.add_value('project', self.settings.get('BOT_NAME'))
        l.add_value('spider', self.name)
        l.add_value('server', socket.gethostname())
        l.add_value('date', datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
        return l.load_item()

When I change .extract_first() to .extract() the spider stops to run, however I cannot work out how to pass each image url (there can be dozens on a page) to be its own item + download.

Any help would be greatly appreciated!

score 0 · Accepted Answer · answered Mar 21 '19 at 15:25

0

You are only getting one image because return exits your method immediately. Use yield instead of return for the desired behavior.

See this other answer for details.

answered Mar 21 '19 at 15:25

Gallaecio

3,620
2
25
64

Thank you very much for the help, has fixed my problem and have learned something new in the process for the future. – Tomzski Mar 22 '19 at 16:45

Download multiple items through imagepipeline

1 Answers1