I am trying to get the price data from the following url. However I can only seem to get the text from 'div's down to a certain level, here is my code:
from selenium import webdriver
from bs4 import BeautifulSoup
def scrape_flight_prices(URL):
browser = webdriver.PhantomJS()
# PARSE THE HTML
browser.get(URL)
soup = BeautifulSoup(browser.page_source, "lxml")
page_divs = soup.findAll("div", attrs={'id':'app-root'})
for p in page_divs:
print(p)
if __name__ == '__main__':
URL1="https://www.skyscanner.net/transport/flights/brs/gnb/190216/190223/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#results"
And here is the output:
<div id="app-root">
<section class="day-content state-loading state-no-results" id="daysection">
<div class="day-searching">
<div class="hot-spinner medium"></div>
<div class="day-searching-message">Searching</div>
</div>
</section>
</div>
The section of html I want to scrape from looks like this:
https://www.skyscanner.net/transport/flights/brs/gnb/190216/190223/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=1&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=home#results
However when I try and scrape with the following code:
prices = soup.findAll("a", attrs={'target':"_blank", "data-e2e":"itinerary-price", "class":"CTASection__price-2bc7h price"})
for p in prices:
print(p)
It prints nothing! I suspect a js script is running something to generate the rest of the the code and/or data? Can anyone help me extract the data? Specifically I am trying to get the price, flight times, airline name etc but if beautiful soup is not printing the relevant html from the page then I'm not sure how else to get it?
Would appreciate any pointers! Many thanks in advance!