0

I am doing a project for British Airlines, and the website is www.airlinequality.com

Please, take a look at my code. It does not return any errors, but it does not scrape anything either.

I think the problem is in <<item.find>> section of the code.

Can someone look at the website? I am really struggling with finding the needed tags and attributes

url = 'https://www.airlinequality.com/airline-reviews/british-airways/page/1/?sortby=post_date%3ADesc&pagesize=100'

def get_soup(url):
    r = requests.get('http://localhost:8050', params = {'url':url})
    soup = BeautifulSoup(r.text, "lxml")
    return soup
reviewlist=[]
def get_reviews(soup):
    reviews = soup.find_all('div', {'itemprop':'reviewBody'})
    try:
        for item in reviews:
            reviews = {
                'rating': item.find('div', {'itemprop':'reviewRating'}),
                'seat_type': item.find('td', {'class':'review-value'}),
                'body': item.find('div', {'class':'text_content'}).text.strip(),
                'recommended': item.find('td', {'class':'review-rating-header recommended'})
            }
            reviewlist.append(reviews)
    except:
        pass

for x in range(1,100):
    soup = get_soup(f'https://www.airlinequality.com/airline-reviews/british-airways/page/{x}/?sortby=post_date%3ADesc&pagesize=100')
    print(f'Getting page: {x}')
    get_reviews(soup)
    print(len(reviewlist))
    if not soup.find('li', {'class':'off'}):
        pass
    else:
        break

Heading ##Getting page: 1

0

Heading ##Getting page: 2

0

Heading ##Getting page: 3

0

Heading ##Getting page: 4

0

Heading ##Getting page: 5

0

Heading ##Getting page: 6

0

Heading ##Getting page: 7

0

Heading ##Getting page: 8

0

Heading ##Getting page: 9

0

Heading ##Getting page: 10

0

  • 1
    `except: pass` is a really bad practice. You may want to change that. – Barry the Platipus Jan 26 '23 at 19:40
  • Welcome to Stack Overflow! This is a good opportunity for you to start familiarizing yourself with [using a debugger](https://stackoverflow.com/q/25385173/328193). When you step through the code in a debugger, which operation first produces an unexpected result? What were the values used in that operation? What was the result? What result was expected? Why? To learn more about this community and how we can help you, please start with the [tour] and read [ask] and its linked resources. – David Jan 26 '23 at 19:41
  • @Barry the Platipus why so? I am new to python, and saw this in a tutorial, so that no foreign_language reveiws are taken into account – Farhad Mustafayev Jan 26 '23 at 19:49
  • @Farhad Mustafayev generally you want to specify which error specifically you want to handle or skip to avoid unexpected issues failing silently. – Brian Karabinchak Jan 26 '23 at 19:50
  • I think you're variable namespace is running into itself. First you have `reviews = soup.find_all('div', {'itemprop':'reviewBody'})` then you have `for item in reviews` - this so far is fine. But then yo uuse reviews again as a dictionary. I would recommend using a different variable name for that and see how things go just to start. Try printing that before you append it to the list as well to see what it is. – Brian Karabinchak Jan 26 '23 at 19:52
  • That is likely a bad tutorial, @FarhadMustafayev. See https://stackoverflow.com/questions/21553327/why-is-except-pass-a-bad-programming-practice, and countless other discussions on the subject. Just don't do it, it's a sign of mediocre, lazy programming. – Barry the Platipus Jan 26 '23 at 19:56
  • @Brian Karabinchak I see your point. But the fact is I have written this entire code when I was scraping Amazon, and it works just fine. But when I moved to www.airlinequality.com, it stopped working. I think the problem is in the tags. I just dont know what to look for – Farhad Mustafayev Jan 26 '23 at 19:59
  • Are you sure `reviews = soup.find_all('div', {'itemprop':'reviewBody'})` is giving you anything back? It seems like its not, otherwise worst case you would have a list of empty dictionaries which should still give you a len of non zero. – Brian Karabinchak Jan 26 '23 at 20:13
  • @Brian Karabinchak It seems like it doesn't give anything back. Can you please check this website "https://www.airlinequality.com/airline-reviews/british-airways" ? I really don't know which tag to choose – Farhad Mustafayev Jan 27 '23 at 10:14

0 Answers0