Airline Price Scraping with Python

Question

I've been trying to create python code to scrape airline prices from JFK to LAX. The URL of the prices that I want to scrape are here: https://www.google.com/flights/#search;f=JFK;t=LAX;d=2014-05-28;r=2014-06-01;tt=o

I would ideally be able to get a list of time of the airline, time of departure and price.

I know that 'div class="GHOFUQ5BGJC>" $210 ' corresponds to the price and 'div class="GHOFUQ5BMFC">Sun Country' corresponds to the airline.

So far, this is what I have

import re
import urllib

html = "https://www.google.com/flights/#search;f=JFK;t=LAX;d=2014-05-28;r=2014-06-01;tt=o"
htmlfile = urllib.urlopen(html)
htmltext = htmlfile.read()

re1 = '<div class="GHOFUQ5BGJC">(.+?)</div>'
pattern1 = re.compile(re1)
price = re.findall(pattern1, htmltext)
re2 ='<div class="GHOFUQ5BMFC">(.+?)</div>'
pattern2 = re.compile(re2)
airline = re.findall(pattern2, htmltext)

print price
print airline

Is there a way to access the price and airline tags through beautiful soup? Or am I on the right track with the regex? When run, the code just gives me two empty lists.

What am I doing wrong? Thanks

Have you looked at the raw content of the html file? It doesn't contain the data unless the javascript embedded in this site is evaluated by the browser... — sebastian, May 12 '14 at 11:50
Ok thanks, it looks like the data is not contained by the raw html file. Is there anyway to get around this? — user3628240, May 12 '14 at 11:57
@user3628240 You may want to look for other sources for data. Trying to scrape data off a Google search result requires far too much effort. — Jeroko, May 12 '14 at 12:03
@Jeroko If the website URL has the info in the raw html document, but the URL is not customized, for example say delta.com stays as delta.com after inputting a destination and date, is it possible to use the website to scrape? Thanks — user3628240, May 12 '14 at 12:14
Generally related: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 Specifically related: http://stackoverflow.com/questions/10210903/can-google-flights-data-be-queried-from-google-api — Robᵩ, May 12 '14 at 15:47

Alex Watson · Answer 1 · 2023-06-15T19:36:08.950

0

Did you examine the raw content of the HTML file? The data is not present unless the browser evaluates the embedded JavaScript on this website

edited Jun 15 '23 at 19:36

answered Jun 10 '23 at 13:45

Alex Watson

1
1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 11 '23 at 09:28

Airline Price Scraping with Python

1 Answers1