1

I am just learning Scrapy and Python and have this issue.

When scraping this website: http://www.laughfactory.com/jokes/family-jokes the code works perfectly.

class JokesSpider(scrapy.Spider):
name = 'jokes'
allowed_domains = ['www.laughfactory.com']
start_urls = ["http://www.laughfactory.com/jokes/family-jokes"]

def parse(self, response):
    for joke in response.xpath("//div[@class='jokes']"):

        yield {
            'joke_text': joke.xpath(".//div[@class='joke-text']").extract_first()
        }

When using similar code on another website: https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077 the code:

class eKupiSingleCategoryXPath(scrapy.Spider):
name = "monitor_xpath"
allowed_domains = ["https://www.ekupi.hr/hr/"]
start_urls = ["https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077"]

def parse(self, response):
    for monitorSelectXPath in response.xpath("//div[@class='details']"):
        sleep(1)

        yield {
            "name": monitorSelectXPath.xpath("//a[@class='name']/text()").extract_first()
        }

I believe I am using the right selectors and I believe the code is okay as it works with CSS selectors. Output is always the same with xpath selectors.

Output below:

2020-05-07 23:04:17 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:18 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:23 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:24 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:31 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:33 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077>
{'name': '\n\t\t\t\t\tAcer monitor Nitro VG240Ybmiix UM.QV0EE.001, IPS, 1ms, AMD FreeSync, ZeroFrame, Zvučnici, HDMIx2, 23.8"'}
2020-05-07 23:04:41 [scrapy.core.engine] INFO: Closing spider (finished)
G0000000se
  • 39
  • 5

1 Answers1

0

Remove // in the xpath expression. Update the yield statement as below.

yield {
            "name": monitorSelectXPath.xpath("a[@class='name']/text()").extract_first()
        }

Also scrapy shell enables you to test your selectors. Terminal command below:

scrapy shell https://www.ekupi.hr/hr/Ra%C4%8Dunala/Ra%C4%8Dunala-i-periferija/Monitori/c/10077
erncnerky
  • 394
  • 3
  • 10
  • 1
    Oh my God. Thank you in the millions. May I ask, why does it work without "//" and why it return so much of the same with "//"? Thank you again! EDIT: I think I figured it out in layman terms. :) Basically without "//" or "." it only loops the first thing it sees and is always stuck in the same section, am I correct? – G0000000se May 07 '20 at 21:42
  • 1
    Sorry. I meant to say, without ".//" and only with "//" the code is stuck in the same section of website. I was trying to solve this problem for about 3 hours, so please understand my typos. – G0000000se May 07 '20 at 21:58
  • You are right. For more details https://stackoverflow.com/questions/35606708/what-is-the-difference-between-and-in-xpath – erncnerky May 07 '20 at 22:06