IndexError: list index out of range(Python)

Question

Please find my code :

# # Parse a new post. #

  def parse_new_post(self,response,review,created_at,data):
    data.update({
      'cool_count':self.set_int(review.css('a[rel=cool]').css('span[class=count]::text').extract()),
      'created_at':self.set_date(review.css('meta[itemprop=datePublished]::attr(content)').extract()[0]),
      'elite':len(review.css('.is-elite')) == 1,
      'funny_count':self.set_int(review.css('a[rel=funny]').css('span[class=count]::text').extract()),
      'owner_comment_text':self.set_text(review.css('span[class=js-content-toggleable\ hidden]::text').extract()).replace("\n"," "),
      #'rating':review.css('div[itemprop=reviewRating]').css('div').css('i::attr(title)').re('(\\d\.\\d)'),
      'rating':review.css('div[itemprop=reviewRating]').css('meta').css('::attr(content)').re('(\\d\.\\d)')[0].encode('utf-8'),
        #'review_id':review.css('div::attr(data-review-id)').extract()[0].encode('utf-8'),
      #'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract(),
     'review_text':self.set_text(review.css('p[itemprop=description]::text').extract()).replace("\n"," "),
      'total_friends':self.set_int(review.css('li[class=friend-count]').css('b::text').extract()),
      #'total_friends':int(review.xpath('.//li[contains(@class,"friend-count")]/span/b/text()').extract()[0].strip()),
      #'total_reviews':int(review.xpath('.//li[contains(@class,"review-count")]/span/b/text()').extract()[0].strip()),
        #'total_friends':int(review.xpath('.//li[contains(@class,"friend-count")]/b/text()').extract()[0].strip()),
        #'total_reviews':int(review.xpath('.//li[contains(@class,"review-count")]/b/text()').extract()[0].strip()),
        'total_reviews':self.set_int(review.css('li[class=review-count]').css('b::text').extract()),
      'user_id':review.css('div[class*=photo-box]').css('a::attr(href)').extract(),
      'useful_count':self.set_int(review.css('a[rel=useful]').css('span[class=count]::text').extract()),
      #'user_location':review.css('li[class=user-location]').css('b::text').extract()[0].encode('utf-8'),
      'user_location':review.xpath('.//li[@class="user-location responsive-hidden-small"]/b/text()').extract(),
      'username':review.css('meta[itemprop=author]::attr(content)').extract()[0].encode('utf-8'),
        'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()[0].encode('utf-8'),
    })

When I am crawling the website I am getting error below:

2016-11-22 01:27:52 [scrapy] ERROR: Spider error processing <GET https://www.yelp.com/biz/lexus-of-glendale-glendale?utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=HPtU-ro8MXX3MOY_DQkP6A?sort_by=date_desc> (referer: None)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 28, in process_spider_output
    for x in result:
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 54, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 85, in parse
    yield self.check_for_new_post(response,review,created_at,data)
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 95, in check_for_new_post
    return self.parse_new_post(response,review,created_at,data)
  File "/c360/apps/c360nextgen/src/crawlers/yelp_new/yelp_new/spiders/lexus_posts.py", line 123, in parse_new_post
    'review_id':review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()[0].encode('utf-8'),
IndexError: list index out of range

Hi! welcome! I think you should provide code of your application where the crash happens, in that case community will try to help you — wolendranh, Nov 22 '16 at 09:24
`review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()` is an empty list — Chr, Nov 22 '16 at 09:24
While i m fetching it, its returning value.. Can u please help me with the code? — Varun rai, Nov 22 '16 at 09:35

score 0 · Answer 1 · answered Nov 22 '16 at 09:27

0

Your query returns an empty list. Therefore it can't find the first element [0] and throws an IndexError. Fix the code you use to crawl the website.

answered Nov 22 '16 at 09:27

Sven

2,839
7
33
53

can you please help me where i should fix ? – Varun rai Nov 22 '16 at 09:30
I assume something in `review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()` is wrong, but you need to check your xpath yourself. – Sven Nov 22 '16 at 09:35
While i m using (.//div[contains(@class,"review review--with-sidebar")]/@data-review-i) , i am getting data in a list. – Varun rai Nov 22 '16 at 09:37

Sebastian Nielsen · Answer 2 · 2016-11-22T09:33:18.087

0

the indexerror: list out of range, simply means that you are trying to call an index/item in the list that doesnt exist.

Here is an example:

list = [1, 2]
print(list[4])

Notice that there isnt a 5. item in the list therefor it will result in a: IndexError: list index out of range(Python)

In your ocation, it returns en empty list, which you try to find the first element of [0] (0, is the first item, but there is none items in the list)

edited Nov 22 '16 at 09:33

answered Nov 22 '16 at 09:28

Sebastian Nielsen

3,835
5
27
43

U are fetching 4 as a value or 4 as a index? – Varun rai Nov 22 '16 at 09:31
4 as an index. list[4] means the 5. item in the list, not the number 4 If it helped please upvote :) – Sebastian Nielsen Nov 22 '16 at 09:33
Thanks @Seastian.. can u help me , what should i change in my code? – Varun rai Nov 22 '16 at 09:34
You havent shown all of your code, and the part of your code you do show is kinda messy. Its hard to help you with these conditions – Sebastian Nielsen Nov 22 '16 at 09:37
Maybe you can provide me with the website, maybe I could check the xpath . And check if its correct – Sebastian Nielsen Nov 22 '16 at 09:39
https://www.yelp.com/biz/lexus-of-glendale-glendale?utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=HPtU-ro8MXX3MOY_DQkP6A?sort_by=date_desc. This is the website – Varun rai Nov 22 '16 at 09:44
I m crawling the data from yelp website. – Varun rai Nov 22 '16 at 10:09

score 0 · Answer 3 · answered Nov 22 '16 at 09:38

0

Add a check on review list :

review_list = review.xpath('.//div[contains(@class,"review review--with-sidebar")]/@data-review-id').extract()
if review_list:
    'review_id' : review_list[0].encode('utf-8')
else:
    'review_id' : ""

answered Nov 22 '16 at 09:38

Chr

875
1
10
27

IndexError: list index out of range(Python)

3 Answers3