I am trying to scrape company info from kompass.com
However, as each company profile provide different amount of details, certain pages may have missing elements. For example, not all companies have info on 'Associations'. In such cases, my script takes extremely long searching for these missing elements. Is there anyway I can speed up the search process?
Here's the excerpt of my script:
import time
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import ElementNotVisibleException
from lxml import html
def init_driver():
driver = webdriver.Firefox()
driver.wait = WebDriverWait(driver, 5)
return driver
def convert2text(webElement):
if webElement != []:
webElement = webElement[0].text.encode('utf8')
else:
webElement = ['NA']
return webElement
link='http://sg.kompass.com/c/mizkan-asia-pacific-pte-ltd/sg050477/'
driver = init_driver()
driver.get(link)
driver.implicitly_wait(10)
name = driver.find_elements_by_xpath("//*[@id='productDetailUpdateable']/div[1]/div[2]/div/h1")
name = convert2text(name)
## Problem:
associations = driver.find_elements_by_xpath("//body//div[@class='item minHeight']/div[@id='associations']/div/ul/li/strong")
associations = convert2text(associations)
It takes more than a minute to scrape each page and I have more than 26,000 pages to scrape.