1

I have been working on a web scraper in python to scrape Google Finance, but I can't find the specific tag I'm looking for using the find() method. Finally, I got so annoyed that I decided to write the returned data to a file and look for it myself. So I wrote it to testing.html in the same directory, and opened it with Google Chromium so I could use the inspect tool. Within minutes, I found the element I was looking for. What am i doing wrong? My code is attached below:

import dryscrape

session = dryscrape.Session()


def get(url):
    global session
    try:
        session.visit(url)
        data = session.body()
    except:
        print('Connection Failed')
    return str(data)

def save(price, stockname):
    pass

def extract(data):
    return data.find('<div class="YMLKec fxKbKc">')

class following():
    apple = "https://www.google.com/finance/quote/AAPL:NASDAQ"
    tesla = "https://www.google.com/finance/quote/TSLA:NASDAQ"
    google = "https://www.google.com/finance/quote/GOOGL:NASDAQ"
    amazon = "https://www.google.com/finance/quote/AMZN:NASDAQ"
    microsoft = "https://www.google.com/finance/quote/MSFT:NASDAQ"
    netflix = "https://www.google.com/finance/quote/NFLX:NASDAQ"
    def __init__():
        global apple
        global tesla
        global google
        global amazon
        global microsoft
        global netflix
        save(extract(get(following.apple)), following.apple)
        save(extract(get(following.tesla)), following.tesla)
        save(extract(get(following.google)), following.google)
        save(extract(get(following.amazon)), following.amazon)
        save(extract(get(following.microsoft)), following.microsoft)
        save(extract(get(following.netflix)), following.netflix)

f = open("testing.html")
print(extract(f.read()))
f.close()
ShortsKing
  • 47
  • 1
  • 8
  • Maybe the class description is individual and changes after each reload? And why are you not using BeautifulSoup for parsing the HTML? Makes work much easier imho. –  Oct 24 '21 at 16:05
  • i only need one value, so i think beautiful soup is kinda overkill. and i checked, the class has been the exact same for the past three days. @js-on – ShortsKing Oct 24 '21 at 16:11
  • it could be useful to see the slice of the html source that your looking for, could add it as part of your question? – cards Oct 24 '21 at 16:17
  • Can you verify that the string you're looking for is present within testing.html? If you found it with the Inspection Tools, the content is maybe loaded by some javascript and can't be fetched that easily. That only a theory, dunno how dryscraper is working. –  Oct 24 '21 at 16:20
  • it seems as though your theory is correct. how would i track down where to get the actual data? @js-on – ShortsKing Oct 24 '21 at 16:32
  • 1
    Delete cache and cookies of the site, clear all entries within the network tab of your Inspection Tool and Ctrl + F5 (reload) the page. If all data has been fetched, search for the string and you'll hopefully get the correct source of the data. –  Oct 24 '21 at 16:34
  • The value is within the downloaded page but I can't find it in Python as well. –  Oct 24 '21 at 16:46

2 Answers2

0

Why don't you try using requests and BeautifulSoup library instead. The following is what I meant.

import requests
from bs4 import BeautifulSoup


class following():

    def __init__(self):
        self.session = requests.Session()
        self.session.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'

    def get_last_price(self,link):
        r = self.session.get(link)
        return BeautifulSoup(r.text,"lxml")

    def extract(self,soup):
        return soup.select_one("[data-exchange='NASDAQ']")['data-last-price']


if __name__ == '__main__':
    base = "https://www.google.com/finance/quote/{}:NASDAQ"
    scraper = following()

    for ticker in ['AAPL','TSLA','GOOGL','AMZN','MSFT','NFLX']:
        soup_object = scraper.get_last_price(base.format(ticker))
        print(scraper.extract(soup_object))
SMTH
  • 67
  • 1
  • 4
  • 17
0

Found the issue: It's not YMLKec but YMlKec. Not a capital L.

data = open("testing.html", "r").read()
class_ = "YMlKec fxKbKc"
print(data.find(class_))
>>> 992880