0

I've been trying to create python code to scrape airline prices from JFK to LAX. The URL of the prices that I want to scrape are here: https://www.google.com/flights/#search;f=JFK;t=LAX;d=2014-05-28;r=2014-06-01;tt=o

I would ideally be able to get a list of time of the airline, time of departure and price.

I know that 'div class="GHOFUQ5BGJC>" $210 ' corresponds to the price and 'div class="GHOFUQ5BMFC">Sun Country' corresponds to the airline.

So far, this is what I have

import re
import urllib

html = "https://www.google.com/flights/#search;f=JFK;t=LAX;d=2014-05-28;r=2014-06-01;tt=o"
htmlfile = urllib.urlopen(html)
htmltext = htmlfile.read()

re1 = '<div class="GHOFUQ5BGJC">(.+?)</div>'
pattern1 = re.compile(re1)
price = re.findall(pattern1, htmltext)
re2 ='<div class="GHOFUQ5BMFC">(.+?)</div>'
pattern2 = re.compile(re2)
airline = re.findall(pattern2, htmltext)

print price
print airline

Is there a way to access the price and airline tags through beautiful soup? Or am I on the right track with the regex? When run, the code just gives me two empty lists.

What am I doing wrong? Thanks

user3628240
  • 877
  • 1
  • 23
  • 41
  • 3
    Have you looked at the raw content of the html file? It doesn't contain the data unless the javascript embedded in this site is evaluated by the browser... – sebastian May 12 '14 at 11:50
  • Ok thanks, it looks like the data is not contained by the raw html file. Is there anyway to get around this? – user3628240 May 12 '14 at 11:57
  • 1
    @user3628240 You may want to look for other sources for data. Trying to scrape data off a Google search result requires far too much effort. – Jeroko May 12 '14 at 12:03
  • @Jeroko If the website URL has the info in the raw html document, but the URL is not customized, for example say delta.com stays as delta.com after inputting a destination and date, is it possible to use the website to scrape? Thanks – user3628240 May 12 '14 at 12:14
  • Generally related: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 Specifically related: http://stackoverflow.com/questions/10210903/can-google-flights-data-be-queried-from-google-api – Robᵩ May 12 '14 at 15:47

1 Answers1

0

Did you examine the raw content of the HTML file? The data is not present unless the browser evaluates the embedded JavaScript on this website

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 11 '23 at 09:28