0

Here is the code that I am testing.

import csv
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.accept_untrusted_certs = True
import time


#browser = webdriver.Firefox(executable_path="C:/Utility/geckodriver.exe")
wd = webdriver.Firefox(executable_path="C:/Selenium/geckodriver.exe", firefox_profile=profile)
url = "https://finviz.com/login.ashx"
wd.get(url)

# set username
time.sleep(1)
username = wd.find_element_by_name("email")
username.send_keys("me@gmail.com")
#wd.find_element_by_id("identifierNext").click()

# set password
#time.sleep(2)
password = wd.find_element_by_name("password")
password.send_keys("me_pass")

# https://stackoverflow.com/questions/21350605/python-selenium-click-on-button
wd.find_element_by_css_selector('.button.is-primary.is-large').click()


# wait max 10 seconds until "theID" visible in Logged In page
time.sleep(5)
#content = wd.page_source
#print(BeautifulSoup(content, 'html.parser'))


url_base = "https://finviz.com/quote.ashx?t="
tckr = ['SBUX','MSFT','AAPL']
url_list = [url_base + s for s in tckr]
#print(url_list)

with open('C:\\stocks.csv', 'a', newline='') as f:
    writer = csv.writer(f)

    for url in url_list:
        #print(url)
        try:
            wd.get(url)
            fpage = wd.current_url
            #print(fpage)
            data = fpage.text
            fsoup = BeautifulSoup(data, 'html.parser')
            #print(url_base)
            print(fsoup)

            # write header row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))            

        except:
            print("{} - not found".format(url))

In the case above, my code is going straight to the except, because the try fails. i think the problem is in this line:

fsoup = BeautifulSoup(data, 'html.parser')

This is my error:

AttributeError: 'str' object has no attribute 'text'

I looked at the documentation here:

https://www.selenium.dev/documentation/en/webdriver/web_element/

I guess the webdriver has to interact with BeautifulSoup, but for some reason they are not playing well together. That's my guess. I'm stuck now. Thoughts? Suggestions?

ASH
  • 20,759
  • 19
  • 87
  • 200
  • 1
    Frankly, you could do this even without `BeautifulSoup` using `wd.find_element_by_xpath()` or `wd.find_element_by_class_name()` – furas May 27 '20 at 15:33

1 Answers1

2

After your wd.get(url) do this:

fpage=wd.page_source
fsoup = BeautifulSoup(fpage, 'html.parser')
0buz
  • 3,443
  • 2
  • 8
  • 29
  • That works! Thanks. One more thing. How can I get the 'tckr' to identify each set of records. Now I have no identifiers for all the data in each table? – ASH May 27 '20 at 15:14
  • You could integrate it into `writer.writerow`. I haven't tried it, but something like `writer.writerow(url[-4:] +',' + map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))` – 0buz May 27 '20 at 15:21