-2
import scrapy

class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://go.twitch.tv/directory']
def parse(self, response):
    for title in response.css('body'):
        yield {'title': title.css('h3.tw-box-art-card__title::text').extract()}

    for next_page in response.css('a::attr(href)'):
        yield response.follow(next_page, self.parse)

It just crawls and scrapes https://go.twitch.tv/directory but doesn't put out any titles.

I'm new to Python so the problem is probably really obvious but I can't figure it out.

Massaxe
  • 33
  • 2
  • 8

1 Answers1

1

As @Shahin mentioned, page generated dynamically and you can't parse it, without something like selenium or splash. Read this.

Also there is another way: You can make some searches in how request generated which will give you needed data.

For example, when page loaded or when you go to the bottom, there is request to the https://gql.twitch.tv/gql with some data, look at the image below: Request image

This is request will return you json with directory games description:request response data So, i think that you need just find out how request data build and make request not the twitch.tv/directory, but the gql.twitch.tv/gql and parse response which in json format.

How to make request with body read here (there is body argument)

vezunchik
  • 3,669
  • 3
  • 16
  • 25
Fidan
  • 387
  • 4
  • 13