not able to apply (filter)if else condition

Question

I was using scrapy for web scraping, I can grab all elements but my target is to get all the names having reviews greater than 50 , I don't know where I am lacking

import scrapy


class TripadSpider(scrapy.Spider):
    name = 'tripad'
    allowed_domains = ['www.tripadvisor.in']
    start_urls = ['https://www.tripadvisor.in/Restaurants-g304554-c33-Mumbai_Maharashtra.html']
    first = 'https://www.tripadvisor.in/'

    def parse(self, response):
        for i in response.xpath("//div[@class='_2Q7zqOgW Vt o']"):
            rating = str(i.xpath(".//span[@class='w726Ki5B']/text()").get())

            if rating >= '50':
                title = i.xpath(".//a[@class='_15_ydu6b S5 H4 Cj b']/text()").getall()
                yield {
                    'title':title,
                    'rating':rating
                }
            elif rating == 'None':
                continue

        next_page = response.xpath("//a[@class='nav next rndBtn ui_button primary taLnk']/@href").get()
        if next_page:
            sequence = (self.first,next_page)
            nexturl = ''.join(sequence)
            yield scrapy.Request(url=nexturl,callback=self.parse)

can somebody assist me

ValueError: invalid literal for int() with base 10: '1,076 getting this error — Iswar Chand, Sep 02 '21 at 04:42
@sittsering the string comparison will still work as when used on strings it will compare lexicographical order — sb_, Sep 02 '21 at 04:42
this script is working for few pages then starts misbehaving — Iswar Chand, Sep 02 '21 at 04:44
So with your ValueError it shows the rating is 1,076. When comparing '1,076' >= '50' , because of the comma, it will not return true. You will need to parse the rating and treat is as an integer — sb_, Sep 02 '21 at 04:48
you can't check the string values as less than or greater than condition. only integer datatype can apply for less than or greater than condition. so you can convert the rating in to int like a if int(rating) >= 50: — Dev, Sep 02 '21 at 04:48
try ```int(float(rating))```. Then you will have to change your if statement to compare integers — sb_, Sep 02 '21 at 04:50
ValueError: could not convert string to float: '1,076' **if int(float(rating)) >= 50** — Iswar Chand, Sep 02 '21 at 04:51
you sholud convert the rating value as integer like if int(rating) >= 50: — Dev, Sep 02 '21 at 04:54
Try this in your python console ```rating = ''.join('1,076'.split(','))``` Here you split the value 1,076 into an array ['1', '076'] then join it back with no spaces. You'll have to check if the rating string contains a comma as this will likely throw an error. Maybe use a try - except to set the rating. Then you just have to parse this string as an integer! — sb_, Sep 02 '21 at 04:55
thanks @sittsering it is working using this https://stackoverflow.com/questions/5188792/how-to-check-a-string-for-specific-characters — Iswar Chand, Sep 02 '21 at 05:14
@sb_ that will wont do,i .e, "101" is less than "50" by string comparison. — sittsering, Sep 02 '21 at 05:15

Dev · Answer 1 · 2021-09-02T05:20:40.887

0

Replace Comma with Decimal

if type(eval(rating)) == int or type(eval(rating)) == float:
    rating_string_decimal = rating.replace(',','.')
    val = float(rating_string_decimal )
    if round(rating_string_decimal)>= 50:
          title = i.xpath(".//a[@class='_15_ydu6b S5 H4 Cj b']/text()").getall()
                yield {
                    'title':title,
                    'rating':rating
                }
elif rating == 'None':
    continue

edited Sep 02 '21 at 05:20

answered Sep 02 '21 at 05:01

Dev

387
1
12

buddy it is not working fine – Iswar Chand Sep 02 '21 at 05:11
now check i update the code – Dev Sep 02 '21 at 05:20
https://dpaste.org/wrnu this worked for me – Iswar Chand Sep 02 '21 at 05:34

not able to apply (filter)if else condition

1 Answers1

Replace Comma with Decimal