2

So basically I wrote out this Python code using the Selenium library that could scrape out all 239 rows from a table on a website. I was able to successfully scrape the first 4 columns using the XPath selector but while trying to scrape for the last four columns it kept on returning empty values (" ") with the elements still being present on the website.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
#from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

url = 'https://www.adducation.info/general-knowledge-travel-and-transport/emergency-numbers/'
path= 'xxxxxxxxxx'


service=Service(executable_path=path)
driver=webdriver.Chrome(service=service)
driver.get(url)
driver.implicitly_wait(20)
time.sleep(10)

containers = driver.find_elements(by='xpath', value='//tr')

Country = []
Emergency = []
Police = []
Ambulance = []
Fire = []
Group = []
Calling_codes = []
Local_emergency_no = []


P=range(1,240)
for i,j in zip(containers,P):
    try:
        A = i.find_element(by='xpath',value=f'//tr[{j}]/td[1]/strong').text  
        B = i.find_element(by='xpath',value=f'//tr[{j}]/td[2]').text
        C = i.find_element(by='xpath',value=f'//tr[{j}]/td[3]').text
        D = i.find_element(by='xpath',value=f'//tr[{j}]/td[4]').text
        E = i.find_element(by='xpath',value=f'//tr[{j}]/td[5]').text
        F = i.find_element(by='xpath',value=f'//tr[{j}]/td[6]').text
        G = i.find_element(by='xpath',value=f'//tr[{j}]/td[7]').text
        H = i.find_element(by='xpath',value=f'//tr[{j}]/td[8]').text
    
    except:
        A = i.find_element(by='xpath',value=f'//tr[{j}]/td[1]/em/strong').text
        B = i.find_element(by='xpath',value=f'//tr[{j}]/td[2]').text
        C = i.find_element(by='xpath',value=f'//tr[{j}]/td[3]').text
        D = i.find_element(by='xpath',value=f'//tr[{j}]/td[4]').text
        E = i.find_element(by='xpath',value=f'//tr[{j}]/td[5]').text
        F = i.find_element(by='xpath',value=f'//tr[{j}]/td[6]').text
        G = i.find_element(by='xpath',value=f'//tr[{j}]/td[7]').text
        H = i.find_element(by='xpath',value=f'//tr[{j}]/td[8]').text
    

    finally:
        Country.append(A)
        Emergency.append(B)
        Police.append(C)
        Ambulance.append(D)
        Fire.append(E)
        Group.append(F)
        Calling_codes.append(G)
        Local_emergency_no.append(H)

dict_={'Country' : Country,
    'Emergency' : Emergency, 
    'Police' : Police, 
    'Ambulance' : Ambulance, 
    'Fire' : Fire, 
    'Continent' : Group, 
    'Calling_codes' : Calling_codes,
    'Local_emergency_no' : Local_emergency_no
    }

Emergency_DS = pd.DataFrame(dict_)
print(Emergency_DS)

3 Answers3

2

To scrape data from the Emergency Numbers List table from the website ≡ Emergency Numbers List: 911, 112 & 999 Numbers Worldwide you need to induce WebDriverWait for the visibility_of_element_located() for the <table> element and using DataFrame from Pandas you can use the following locator strategy:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(options=options)
driver.get(url='https://www.adducation.info/general-knowledge-travel-and-transport/emergency-numbers/')
table_data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.dataList.footable-loaded.footable.default"))).get_attribute("outerHTML")
df = pd.read_html(table_data)
print(df)
driver.quit()

Console Output:

[    Country / Territory ☎ Emergency  ... Calling codes                     Local emergency numbers & info
0         Afghanistan         NaN  ...           +93  You can dial 020 112 from mobile but only   in K...
1             Albania         NaN  ...          +355                                                  NaN
2             Algeria         NaN  ...          +213                      Dial 1548 for tourist   police.
3      American Samoa         911  ...        +1 684                                                  NaN
4             Andorra         112  ...          +376                                                  NaN
..                  ...         ...  ...           ...                                                ...
234   Wallis & Futuna         NaN  ...          +681                                                  NaN
235    Western Sahara         150  ...          +212            This disputed state is part of M  orocco.
236             Yemen         NaN  ...          +967                                                  NaN
237            Zambia         112  ...          +260                                                  NaN
238          Zimbabwe         999  ...          +264                                                  NaN

[239 rows x 8 columns]]

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

You do not need Selenium (a testing framework, vastly misused for web scraping purposes) to obtain that data. Here is another way:

import pandas as pd
df = pd.read_html('https://www.adducation.info/general-knowledge-travel-and-transport/emergency-numbers/')[0]
print(df)

Result in terminal:

    Country / Territory     ☎ Emergency     ☎ Police    ☎ Ambulance     ☎ Fire  Group   Calling codes   Local emergency numbers & info
0    Afghanistan    NaN     119     119, 102    112, 119    Asia    +93     You can dial 020 112 from mobile but only in Kabul.
1    Albania    NaN     129     127     128     Europe  +355    NaN
2    Algeria    NaN     17  14  14  Africa  +213    Dial 1548 for tourist police.
3    American Samoa     911     NaN     NaN     NaN     Oceania     +1 684  NaN
4    Andorra    112     110     118     118     Europe  +376    NaN
...     ...     ...     ...     ...     ...     ...     ...     ...
234      Wallis & Futuna    NaN     18  15  17  French, Oceania     +681    NaN
235      Western Sahara     150     NaN     NaN     NaN     Africa  +212    This disputed state is part of Morocco.
236      Yemen  NaN     194     191     191     Asia    +967    NaN
237      Zambia     112     999     993     991     Africa  +260    NaN
238      Zimbabwe   999     995     994     993     Africa  +264    NaN

239 rows × 8 columns

You can also save that dataframe as a .csv document, if you want -- see pandas documentation for more information.

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30
  • 1
    Yeah thank you for this. I have a knowledge on this already. Basically, I’m working on using selenium to web scrape for a personal project. So i was hoping i could find a way around this. – Mubaraq Onipede Aug 05 '23 at 16:41
  • This answer addresses the issue in your question, namely getting the respective data @MubaraqOnipede – Barry the Platipus Aug 05 '23 at 16:52
  • 1
    I just got an answer using selenium. I included the driver.maximize_window() to access the whole window. – Mubaraq Onipede Aug 05 '23 at 17:04
  • 1
    And again, please why do feel selenium is vastly misused for web scraping purpose as against it being used as a testing framework – Mubaraq Onipede Aug 05 '23 at 17:06
1

If you still want to use Selenium for this, You just have to add this line at the beginning of the script :

driver.maximize_window()

else the table doesn't appear in full. Moreover, you can decrease your implicity wait, and you have to handle cookies popup if you don't do it yourself.

  • Thank youuuuu. You are a lifesaver!!!!!!. I just figured it out now. With maximizing the window I was able to scrape it all. Thankssss – Mubaraq Onipede Aug 05 '23 at 17:03