4

I am newbie to scrapy. I am trying to download an image from here. I was following Official-Doc and this article.

My settings.py looks like:

BOT_NAME = 'shopclues'

SPIDER_MODULES = ['shopclues.spiders']
NEWSPIDER_MODULE = 'shopclues.spiders'

ROBOTSTXT_OBEY = True

ITEM_PIPELINES = {
    'scrapy.contrib.pipeline.images.ImagesPipeline':1
}

IMAGES_STORE="home/pr.singh/Projects"


and items.py looks like:

import scrapy
from scrapy.item import Item

class ShopcluesItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    pass

class ImgData(Item):
    image_urls=scrapy.Field()
    images=scrapy.Field()

I think both these files are good. But I am unable to write correct spider for getting the image. I am able to grab the image URL but don't know how to store image using imagePipeline.
My spider looks like:

from shopclues.items import ImgData
import scrapy
import datetime


class DownloadFirstImg(scrapy.Spider):
    name="DownloadfirstImg"
    start_urls=[
    'http://www.shopclues.com/canon-powershot-sx410-is-2.html',
    ]

    def parse (self, response):
        url= response.css("body div.site-container div#container div.ml_containermain div.content-helper div.aside-site-content div.product form#product_form_83013851 div.product-gallery div#product_images_83013851_update div.slide a#det_img_link_83013851_25781870")

        yield scrapy.Request(url.xpath('@href').extract(),self.parse_page)

        def parse_page(self,response):
            imgURl=response.css("body div.site-container div#container div.ml_containermain div.content-helper div.aside-site-content div.product form#product_form_83013851 div.product-gallery div#product_images_83013851_update div.slide a#det_img_link_83013851_25781870::attr(href)").extract()

            yield {
            ImgData(image_urls=[imgURl])
            }

I have written the spider following this-article. But I am not getting anything. I run my spider as scrapy crawl DownloadfirstImg -o img5.json but I am not getting any json nor any image?
Any help on How to grab image if it's url is known. I have never worked with python also so things seem much complicated to me. Links to any good tutorial may help. TIA

Prashant Prabhakar Singh
  • 1,120
  • 4
  • 15
  • 33

1 Answers1

3

I don't understand why you yield a request for an image you just need to save it on the item and the images pipeline will do the rest, this is all you need.

def parse (self, response):
    url= response.css("body div.site-container div#container div.ml_containermain div.content-helper div.aside-site-content div.product form#product_form_83013851 div.product-gallery div#product_images_83013851_update div.slide a#det_img_link_83013851_25781870")
    yield ImgData(image_urls=[url.xpath('@href').extract_first()])
Rafael Almeida
  • 5,142
  • 2
  • 20
  • 33
  • Great, that worked within seconds. Even I was confused what I was doing, not so sure about how all this works. Can you provide links to some good tutorials other than their documentation. Also, this code worked for first time, after I deleted the image and tried again I am getting `Spider error processing (referer: None)` ? What could be the possible reason? BTW, thanks for help :) – Prashant Prabhakar Singh Sep 28 '16 at 12:00
  • @PrashantPrabhakarSingh After that error occurs the traceback tells you what's wrong. What is the last line of the error? – Rafael Almeida Sep 28 '16 at 12:29
  • Forget it. I was having a corrupted file in the directory, deleted that one and everything worked fine. I couldn't debug my code because even I din't know what I wrote, my code was just kind of copy-paste. Are there good tutorials/blogs available to begin with (other than their documentation)? Thanks BTW. – Prashant Prabhakar Singh Sep 29 '16 at 03:30
  • @PrashantPrabhakarSingh http://mherman.org/blog/2012/11/05/scraping-web-pages-with-scrapy/#.V-ziSnUrJhE this one is a personal favorite – Rafael Almeida Sep 29 '16 at 09:44