How to scrape link within site using scrapy

Question

I'm trying to use scrapy to scrape from a site, and a link within the content of the site. However, when I do this I get an error on the line above the yield statemant in parse:
TypeError: 'NoneType' object does not support item assignment

Here is my code:

class PostsSpider(scrapy.Spider):
    name = "posts"
    start_urls = ['https://www.nba.com/teams/bucks']
    allowed_domains = ['nba.com']

    def parse(self, response):
        for post in response.css('.nba-player-index section section'):
            playerPage = response.urljoin(post.css('a').attrib['href'])
            item = yield scrapy.Request(playerPage, callback=self.helper)
            item['number'] = post.css('span.nba-player-trending-item__number::text').get(),
            yield item

    def helper(self, response):
       print("--->"+response.css("title").get())
       item = Item()
       item['title'] = response.css("title::text").get()
       yield item

class Item(scrapy.Item):
    # define the fields for your item here like:
    number = scrapy.Field()
    title = scrapy.Field()
    ppg = scrapy.Field()

Unless you intended for that method to be a [coroutine](https://stackoverflow.com/q/34469060/1431750), the line `item = yield scrapy.Request(playerPage, callback=self.helper)` is probably wrong. Or, you need to pass in a value for the first `item = yield ...` line using `send(...)`. See the linked question. Also, show the code you're using to call these methods/execute your script. — aneroid, May 30 '20 at 02:47

score 0 · Answer 1 · answered May 30 '20 at 05:20

What you can do is pass number data to helper instead of doing this way. Something like this:

def parse(self, response):
    for post in response.css('.nba-player-index section section'):
        playerPage = response.urljoin(post.css('a').attrib['href'])
        meta = response.meta.copy()
        meta['number'] = post.css('span.nba-player-trending-item__number::text').get()
        yield scrapy.Request(playerPage, callback=self.helper, meta=meta)


def helper(self, response):
       # here you will get `number` in response.meta['number'] that you can yield further.
       item = Item()
       item['number'] = response.meta.get('number)
       yield item

How to scrape link within site using scrapy

1 Answers1