1

I was hoping to ask a question about the process of scrapping information from a website using selenium/selenium base into an excel sheet. I have been playing around with selenium and selenium base for a few days and I am at the point where I want to begin scrapping data points into an excel sheet. This is my code so far:

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import time
import os
from selenium.webdriver.common.action_chains import ActionChains

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.common.keys import Keys
from seleniumbase import BaseCase
from seleniumbase import js_utils
import pandas as pd

s=Service('/Users/[name]/Desktop/chromedriver')

chromeOptions = Options()
chromeOptions.headless = False
driver = webdriver.Chrome(service=s, options=chromeOptions)

class RecorderTest(BaseCase):
    def test_recording(self):
        self.open("https://www.finestrahealth.com/")
        self.assert_exact_text("finestra", 'a[href="/"] span')
        self.type('input[placeholder="Procedure"]', "mri")
        self.type('input[placeholder="Zip code"]', "60007")
        self.type('input[placeholder="Insurance"]', "atena")
        self.click('button:contains("Search")')
        self.assert_element('main p:contains("Hospitals")')
        
        out_pocket_price = driver.find_element(By.CLASS_NAME, "text-[40px] text-[#2962FF] leading-[58px] mb-3 tracking-tight smooth")

        print("out of pocket price:" + out_pocket_price)
        self.sleep(20)

if __name__ == "__main__":
    from pytest import main
    main([__file__])

Well to begin, the out_pocket_price is coded incorrectly, so I would love some insight on that. Assuming that this part is figured out, though, how would I take this value and put it into an excel sheet, amongst other data points? The website is a bit strange because the HTML code does not have a VALUE attribute, like how most tutorials suggest using.

Thanks very much! Its been fun playing around with this framework. Stack Overflow has helped a lot thus far.

  • 1
    In addition to the solution I just posted, make sure to run with `pytest -s` so that you can see the output of `print()` statements, which are captured by default in pytest. – Michael Mintz Dec 25 '22 at 14:37

1 Answers1

0

It looks like you have multiple drivers up, and a lot of extra code, and not a good selector for the out-of-pocket price. Here's a script that does everything, prints the prices, and outputs the results to a CSV file, which is the easiest way to have something that opens easily in Excel:

import codecs
import os
from seleniumbase import BaseCase

class MedicalTest(BaseCase):
    def test_medical_list(self):
        self.open("https://www.finestrahealth.com/")
        self.assert_exact_text("finestra", 'a[href="/"] span')
        self.type('input[placeholder="Procedure"]', "mri")
        self.type('input[placeholder="Zip code"]', "02142")
        self.type('input[placeholder="Insurance"]', "aetna")
        self.click('button:contains("Search")')
        self.assert_element('main p:contains("Hospitals")')
        self.wait_for_element("div.items-center div.flex")
        data_to_save = []
        print("\nFound prices:")
        for item in self.find_elements("div.text-white"):
            print(item.text)
            data_to_save.append(item.text)
        file_name = os.path.join(".", "data_file.csv")
        data_file = codecs.open(file_name, "w+", "utf-8")
        data_file.writelines("\r\n".join(data_to_save))
        data_file.close()
        print("Data saved to data_file.csv")
        self.sleep(2)

if __name__ == "__main__":  # Use "python" to call "pytest"
    from pytest import main
    main([__file__, "-s"])

Here's the console output I got from running that:

Found prices:
$44934
$27139
$17134
$30009
$11
Data saved to data_file.csv

(Since there are print() statements, which pytest captures by default, run with pytest -s, unless you already have a pytest.ini file with that configuration set.)

Michael Mintz
  • 9,007
  • 6
  • 31
  • 48
  • 1
    Hey again, thank you very much for your help. Your responses have really helped me learn a lot about how to navigate using SeleniumBase. I noticed you are actually the director so that's super awesome! Thanks again and happy holidays! Looking forward to playing around with SeleniumBase even more. – imherebcsomethingisntworking Dec 25 '22 at 20:01
  • Great to hear! Happy Holidays! – Michael Mintz Dec 26 '22 at 04:15