4

This question has been asked before, but I've searched and tried and still can't get it to work. I'm a beginner when it comes to Selenium.

Have a look at: https://finance.yahoo.com/quote/FB

I'm trying to web scrape the "Recommended Rating", which in this case at the time of writing is 2. I've tried:

driver.get('https://finance.yahoo.com/quote/FB')
time.sleep(10)
rating = driver.find_element_by_css_selector('#Col2-4-QuoteModule-Proxy > div > section > div > div > div')
print(rating.text)

...which doesn't give me an error, but doesn't print any text either. I've also tried with xpath, class_name, etc. Instead I tried:

source = driver.page_source
print(source)

This doesn't work either, I'm just getting the actual source without the dynamically generated content. When I click "View Source" in Chrome, it's not there. I tried saving the webpage in chrome. Didn't work.

Then I discovered that if I save the entire webpage, including images and css-files and everything, the source code is different from the one where I just save the HTML.

Image

The HTML-file I get when I save the entire webpage using Chrome DOES contain the information that I need, and at first I was thinking about using pyautogui to just Ctrl + S every webpage, but there must be another way.

The information that I need is obviosly there, in the html-code, but how do I get it without downloading the entire web page?

PLASMA chicken
  • 2,777
  • 2
  • 15
  • 25
PythonGeek
  • 41
  • 1
  • 4

4 Answers4

3

Try this to execute the dynamically generated content (JavaScript):

driver.execute_script("return document.body.innerHTML")

See similar question: Running javascript in Selenium using Python

Lena
  • 162
  • 1
  • 12
1

First, you need to wait for the element to be clickable, then make sure you scroll down to the element before getting the rating. Try

element.location_once_scrolled_into_view
element.text

EDIT:

Use the following XPath selector:

'//a[@data-test="recommendation-rating-header"]//following-sibling::div//div[@class="rating-text Arrow South Fw(b) Bgc($buy) Bdtc($buy)"]'

Then you will have:

rating = driver.find_element_by_css_selector('//a[@data-test="recommendation-rating-header"]//following-sibling::div//div[@class="rating-text Arrow South Fw(b) Bgc($buy) Bdtc($buy)"]')

To extract the value of the slider, use

val = rating.get_attribute("aria-label")
nic
  • 169
  • 1
  • 9
Mate Mrše
  • 7,997
  • 10
  • 40
  • 77
  • That CSS-selector works fine and it gives me 56, which is the "Total ESG-score", but it's not that element I'm trying to find. I'm trying to find the Recommended Rating, a scale from 1 to 5. I've tried with xpath, css_selector, class_name, but I can't get it to work. – PythonGeek Mar 19 '19 at 13:19
1

The CSS selector, div.rating-text, is working just fine and is unique on the page. Returning .text will give you the value you are looking for.

JeffC
  • 22,180
  • 5
  • 32
  • 55
0

The script below answers a different question but somehow I think this is what you are after.

import requests
from bs4 import BeautifulSoup

base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,7,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})

light_rows = main_div.find_all('tr', class_="table-light-row-cp")
dark_rows = main_div.find_all('tr', class_="table-dark-row-cp")

data = []
for rows_set in (light_rows, dark_rows):
    for row in rows_set:
        row_data = []
        for cell in row.find_all('td'):
            val = cell.a.get_text()
            row_data.append(val)
        data.append(row_data)

#   sort rows to maintain original order
data.sort(key=lambda x: int(x[0]))

import pandas
pandas.DataFrame(data).to_csv("AAA.csv", header=False)

enter image description here

ASH
  • 20,759
  • 19
  • 87
  • 200