When using scrapy shell, I get no data from response.xpath

Question

I am trying to scrape a betting site. However, when I check for the retrieved data in scrapy shell, I receive nothing.

The xpath to what I need is: //*[@id="yui_3_5_0_1_1562259076537_31330"] and when I write in the shell this is what I get:


In [18]: response.xpath ( '//*[@id="yui_3_5_0_1_1562259076537_31330"]')
Out[18]: []

The output is [] but I expected to be something from which I could extract the href.

When I use the "inspect" tool from Chrome, while the site is still loading, this id is outlined in purple. Does this mean that the site is using JavaScipt? And if this is true, is this the reason why scrapy does not find the item and returns []?

The site is using javascript script to generate random id of the elements. You can try to use the class attribute or best xpath query. What item are you trying to scrape? — GmrYael, Jul 04 '19 at 19:52
do print(response.text) to see what you're really getting. then investigate whats going on with the JS and either Splash it or Selenium it if necessary. My order of operations goes Scrapy > Splash > Selenium — ThePyGuy, Jul 05 '19 at 04:03
Also be sure to set USER_AGENT in your settings as that will be passed on to scrapy shell instances. — ThePyGuy, Jul 05 '19 at 04:36
@gmrYael initially I wanted to scrape the titles of all the live matches. Then I tried to scrape the titles of the football matches, but I got the same problem. I’ll try to scrape after class attributes and I’ll get back to you guys. Thanks! — Ale0311, Jul 05 '19 at 08:19
@ThePyGuy I tried printing the response, but I got nothing. I’ll give splash a try also and see what I get. Thanks! About USER_AGENT, why is that necessary and where to set it? — Ale0311, Jul 05 '19 at 08:21
See https://stackoverflow.com/q/8550114/939364 and https://docs.scrapy.org/en/master/topics/dynamic-content.html — Gallaecio, Jul 05 '19 at 13:25

score 0 · Accepted Answer · edited Jul 06 '19 at 12:00

i try scraping the site just using Scrapy and this is my result.

This the items.py file

    import scrapy

    class LifeMatchsItem(scrapy.Item):

        Event = scrapy.Field() # Name of event
        Match = scrapy.Field() # Teams1 vs Team2
        Date = scrapy.Field()  # Date of Match

This is my Spider code


    import scrapy
    from LifeMatchesProject.items import LifeMatchsItem


    class LifeMatchesSpider(scrapy.Spider):
        name = 'life_matches'
        start_urls = ['http://www.betfair.com/sport/home#sscpl=ro/']

        custom_settings = {'FEED_EXPORT_ENCODING': 'utf-8'}

        def parse(self, response):
            for event in response.xpath('//div[contains(@class,"events-title")]'):
                for element in event.xpath('./following-sibling::ul[1]/li'):
                    item = LifeMatchsItem()
                    item['Event'] = event.xpath('./a/@title').get()
                    item['Match'] = element.xpath('.//div[contains(@class,"event-name-info")]/a/@data-event').get()
                    item['Date'] = element.xpath('normalize-space(.//div[contains(@class,"event-name-info")]/a//span[@class="date"]/text())').get()
                    yield item

And this is the result

Thanks a lot! This was very helpful. However, I have one more, silly, question. How can you print the scraped data is such format? I only managed to print it in .css or .json format? — Ale0311, Jul 08 '19 at 09:51
Scrapy has these formats json,csv and xlsx https://docs.scrapy.org/en/latest/topics/feed-exports.html#topics-feed-format — GmrYael, Jul 08 '19 at 16:37
And in what format did you print it in the photo you posted? — Ale0311, Jul 09 '19 at 17:56

When using scrapy shell, I get no data from response.xpath

1 Answers1

i try scraping the site just using Scrapy and this is my result.