Can't figure out why my Scrapy script isn't working

Question

import scrapy

class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://go.twitch.tv/directory']
def parse(self, response):
    for title in response.css('body'):
        yield {'title': title.css('h3.tw-box-art-card__title::text').extract()}

    for next_page in response.css('a::attr(href)'):
        yield response.follow(next_page, self.parse)

It just crawls and scrapes https://go.twitch.tv/directory but doesn't put out any titles.

I'm new to Python so the problem is probably really obvious but I can't figure it out.

Because of the parse function and the command "scrapy crawl test -o test.csv that I use to run the script — Massaxe, Oct 31 '17 at 21:35
Your code is badly indented. Fix it and you may find it helpful. — SIM, Oct 31 '17 at 21:39
@Massaxe , the content of that webpage is generated dynamically so to catch that you need to use any browser simulator like selenium. — SIM, Oct 31 '17 at 22:03
I'm like 99% sure that the code works because if I change all the selectors and so on to use it on Wikipedia it works perfectly. I think I just have to get good. — Massaxe, Oct 31 '17 at 22:04

score 1 · Accepted Answer · edited Apr 24 '19 at 11:43

As @Shahin mentioned, page generated dynamically and you can't parse it, without something like selenium or splash. Read this.

Also there is another way: You can make some searches in how request generated which will give you needed data.

For example, when page loaded or when you go to the bottom, there is request to the https://gql.twitch.tv/gql with some data, look at the image below:

This is request will return you json with directory games description: So, i think that you need just find out how request data build and make request not the twitch.tv/directory, but the gql.twitch.tv/gql and parse response which in json format.

How to make request with body read here (there is body argument)

Can't figure out why my Scrapy script isn't working

1 Answers1