I've been trying to create python code to scrape airline prices from JFK to LAX. The URL of the prices that I want to scrape are here: https://www.google.com/flights/#search;f=JFK;t=LAX;d=2014-05-28;r=2014-06-01;tt=o
I would ideally be able to get a list of time of the airline, time of departure and price.
I know that 'div class="GHOFUQ5BGJC>" $210 ' corresponds to the price and 'div class="GHOFUQ5BMFC">Sun Country' corresponds to the airline.
So far, this is what I have
import re
import urllib
html = "https://www.google.com/flights/#search;f=JFK;t=LAX;d=2014-05-28;r=2014-06-01;tt=o"
htmlfile = urllib.urlopen(html)
htmltext = htmlfile.read()
re1 = '<div class="GHOFUQ5BGJC">(.+?)</div>'
pattern1 = re.compile(re1)
price = re.findall(pattern1, htmltext)
re2 ='<div class="GHOFUQ5BMFC">(.+?)</div>'
pattern2 = re.compile(re2)
airline = re.findall(pattern2, htmltext)
print price
print airline
Is there a way to access the price and airline tags through beautiful soup? Or am I on the right track with the regex? When run, the code just gives me two empty lists.
What am I doing wrong? Thanks