1

I am still learning Python have and been practicing webscraping. I recently wanted to try out Selenium. This is my project so far.

My goal is to turn my output (which I believe is a string) into a pandas dataframe. Unfortunately I do not know how to separate all of the data into their own columns.

Here is my code so far:

from selenium import webdriver
import time
import pandas as pd

class FindByIdName():
    
    def test(self):
        baseUrl = 'https://vaccinefinder.org/'
        driver = webdriver.Chrome('C:\webdrivers\chromedriver.exe')
        driver.get(baseUrl)        
        time.sleep(3)
        
        #find covid vaccine button    
        vaccineElement = driver.find_element_by_xpath("/html//div[@id='__next']/div/div[2]/div[1]//button[.='Find COVID-19 Vaccines']")        
        vaccineElement.click()        
        time.sleep(3)  
        
        #uncheck moderna vaccine      
        modernaElement = driver.find_element_by_xpath("/html//div[@id='split-screen-content']/form/div[@class='search-form__fields-row']//label[.='Moderna COVID Vaccine']")
        modernaElement.click()
        time.sleep(2)
        
        #uncheck pfizer vaccine
        pfizerElement = driver.find_element_by_xpath("/html//div[@id='split-screen-content']/form/div[@class='search-form__fields-row']//label[.='Pfizer-BioNTech COVID Vaccine']")
        pfizerElement.click()
        time.sleep(2)
        
        #enter zip code
        zipElement = driver.find_element_by_xpath(".//*[@id='zipCode']")
        zipElement.send_keys('92646')        
        time.sleep(3)        
        
        #submit zipCode button
        searchElement = driver.find_element_by_xpath("//div[@id='split-screen-content']/form//button[@type='submit']")
        searchElement.click()        
        time.sleep(3)
        
        #find and print vaccine data
        listElement = driver.find_element_by_xpath("//div[@id='split-screen-content']/main/div[3]")        
        print(listElement.text)
        print(type(listElement.text))          
        driver.close()           
        
ff = FindByIdName()
ff.test()

Here is an example of the final output that I am looking for:

#        Location                            Address                        Distance   Inventory
1.  Vons Pharmacy #3160        8891 Atlanta Ave Huntington Beach, CA 92646   .61 miles   Out Of Stock
2.  Walmart Inc #10-5601       21132 Beach Blvd Huntington Beach, CA 92648   1.05 miles  Out Of Stock
3.  CVS Pharmacy, Inc #09483   10011 Adams Ave  Huntington Beach, CA 92646   1.92 miles  Out Of Stock
...         ...                                 ...                              ...           ...

I have seen some examples online where they convert string data to a dataframe but the data looks to be in csv format: How to convert a string to a dataframe in Python

Any help would be greatly appreciated. I will learn a lot from your advice. Thanks! =)

Able Archer
  • 579
  • 5
  • 19

1 Answers1

1

If you save/return the result instead of just printing:

        ...
        result = listElement.text
        driver.close()
        
        return result

Then you can split() and reshape() into a dataframe:

result = ff.test().split('\n')

columns = ['number', 'pharmacy', 'address', 'distance', 'status', 'view']
data = np.array(result).reshape(len(result)//len(columns), len(columns))

df = pd.DataFrame(data=data, columns=columns).drop(columns=['number', 'view'])
pharmacy address distance status
0 VONS PHARMACY #3160 8891 Atlanta Ave • Huntington Beach, CA 92646 0.61 miles Out Of Stock
1 Walmart Inc #10-5601 21132 Beach Blvd • Huntington Beach, CA 92648 1.05 miles Out Of Stock
... ... ... ... ...
48 Costco Wholesale Corporation #1110 7562 Center Ave • Huntington Beach, CA 92647 5.96 miles Out Of Stock
49 Aloha Pharmacy #0596962 15611 Brookhurst St • Westminster, CA 92683 5.99 miles Out Of Stock
tdy
  • 36,675
  • 19
  • 86
  • 83
  • 1
    Ty sir @tdy! Much appreciated for your help. Can you please explain what this line of code is doing? data = np.array(result).reshape(len(result)//len(columns), len(columns)) – Able Archer Apr 21 '21 at 05:30
  • 1
    @AbleArcher You're welcome! That line converts `result` from a python list into a 2-D numpy array. In the original list, every 6 elements was a new pharmacy. The new numpy array reshapes the 1-D list into 2-D so that each row contains 1 pharmacy with 6 columns of info. – tdy Apr 21 '21 at 05:51
  • 1
    Awesome! So grateful for your response. =) – Able Archer Apr 21 '21 at 05:52