1

I'm trying to scrape this website:

https://www.novanthealth.org/home/patients--visitors/locations/clinics.aspx?behavioral-health=yes

I want to get the clinic names and addresses, and this is the python code I'm using

from selenium import webdriver
import pd 
import time 

#driver = webdriver.Chrome()
specialty   = ["behavioral-health","dermatology","colon","ear-nose-and-    throat","endocrine","express","family-practice","foot-and-ankle",
           "gastroenterology","heart-%26-vascular","hepatobiliary-and-pancreas","infectious-disease","inpatient","internal-medicine",
           "neurology","nutrition","ob%2Fgyn","occupational-medicine","oncology","orthopedics","osteoporosis","pain-management",
           "pediatrics","plastic-surgery","pulmonary","rehabilitation","rheumatology","sleep","spine","sports-medicine","surgical","urgent-care",
           "urology","weight-loss","wound-care","pharmacy"]
name = []
address = []

for q in specialty: 
    driver = webdriver.Chrome()
    driver.get("https://www.novanthealth.org/home/patients--   visitors/locations/clinics.aspx?"+q+"=yes")
    x = driver.find_element_by_class_name("loc-link-right")
    num_page = str(x.text).split(" ")
    x.click() 

    for i in num_page:
        btn = driver.find_element_by_xpath('//*[@id="searchResults"]/div[2]/div[2]/button['+i+']')
        btn.click() 
        time.sleep(8) #instaed of this use waituntil #     
        temp = driver.find_element_by_class_name("gray-background").text
        temp0 = temp.replace("Get directions Website View providers\n","")

        x_temp = temp0.split("\n\n\n")

        for j in range(0,len(x_temp)-1):
            temp1 = x_temp[j].split("Phone:")
            name.append(temp1[0].split("\n")[1])
            temp3 = temp1[1].split("Office hours:")
            temp4 = temp3[0].split("\n")
            temp5 = temp4[1:len(temp4)]
            address.append(" ".join(temp5))
   driver.close()   

This code works fine If I use it for only one specialty at a time, but when I pass the specialties in a loop as above, the code fails in the second iteration with the error:

Traceback (most recent call last):
 File "<stdin>", line 10, in <module>
File "C:\Anaconda2\lib\site- packages\selenium\webdriver\remote\webelement.py", line 77, in click self._execute(Command.CLICK_ELEMENT)
File C:\Anaconda2\lib\sitepackages\selenium\webdriver\remote\webelement.py", line 493, in _execute return self._parent.execute(command, params)
File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\webdriver.py",     line 249, in execute self.error_handler.check_response(response)
 File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 193, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: element not visible
(Session info: chrome=46.0.2490.80)
(Driver info: chromedriver=2.19.346078    (6f1f0cde889532d48ce8242342d0b84f94b114a1),platform=Windows NT 6.1 SP1 x86_64

I don't have much experience using python, any help will be appreciated

Vaibhav
  • 338
  • 2
  • 13
  • You have to make your web driver to wait for some secs until the corresponding elem gets appeared on the page. Have a look at webdriver_wait function.. – Avinash Raj Apr 24 '17 at 07:44
  • I was already going through the documentation on that, but was facing some issues implementing it, can you give a sample code for it ? Thanks ! – Vaibhav Apr 24 '17 at 07:51
  • here it is http://stackoverflow.com/a/41832157/3297613 – Avinash Raj Apr 24 '17 at 07:53
  • @AvinashRaj I added wait = WebDriverWait(driver, 10) wait.until(EC.presence_of_element_located((By.ID, "searchResults"))), above btn = driver.find_element_by_xpath('//*[@id="searchResults"]/div[2]/div[2]/button['+i+']') This time it ran for 2 iterations but gave the same error in the third iteration – Vaibhav Apr 24 '17 at 08:21
  • @Vaibhav: it is worth avoiding asking directly for "a sample code" here. That is usually understood to mean "will you do my work for me", even if that is not the actual intent. – halfer Apr 24 '17 at 08:21
  • @Vaibhav In your url, why there's a space between `patients-- visitors`?? Does it get the page you want? – Chanda Korat Apr 24 '17 at 09:21
  • @halfer yeah I'll keep that in mind, Thanks! – Vaibhav Apr 24 '17 at 10:07
  • @ChandaKorat yeah nice catch, the space got added while pasting the code here – Vaibhav Apr 24 '17 at 10:08

2 Answers2

1

The Error message had told you why it not work.

ElementNotVisibleException: Message: element not visible

The element is not visible if you do not scroll down to see it.

You have to scroll down the list according to the size of your browser,

OR

Just extract the data from the source page, which is easier.

M. Leung
  • 1,621
  • 1
  • 9
  • 9
1

Usually I would do in Selenium Basic, an excel plugin. You can use the same logic in Python. This is tried in VBA and works fine for me.

Private assert As New assert
Private driver As New Selenium.ChromeDriver

Sub sel_novanHealth()
Set ObjWB = ThisWorkbook
Set ObjExl_Sheet1 = ObjWB.Worksheets("Sheet1")
Dim Name As Variant

   'Open the website
    driver.get "https://www.novanthealth.org/home/patients--visitors/locations.aspx"

    driver.Window.Maximize

    driver.Wait (1000)

    'Find out the total number of pages to be scraped
    lnth = driver.FindElementsByXPath("//button[@class='paginate_button']").Count
   'Running the Loop for the Pages
    For y = 2 To lnth
            'Running the Loop for the Elements
            For x = 1 To 10
                Name = driver.FindElementsByXPath("//div[@class='span12 loc-heading']")(x).Text
                ' Element 2
                 'Element 3
            Next x
                driver.FindElementsByXPath("//button[@class='paginate_button']")(y).Click
    Next y

        driver.Wait (1000)


End Sub
Sanjoy
  • 336
  • 6
  • 17