I'm trying to scrape a table from a dynamic page. After the following code (requires selenium), I manage to get the contents of the <table>
elements.
I'd like to convert this table into a csv and I have tried 2 things, but both fail:
pandas.read_html
returns an error saying I don't have html5lib installed, but I do and in fact I can import it without problems.soup.find_all('tr')
returns an error'NoneType' object is not callable
after I runsoup = BeautifulSoup(tablehtml)
Here is my code:
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import pandas as pd
main_url = "http://data.stats.gov.cn/english/easyquery.htm?cn=E0101"
driver = webdriver.Firefox()
driver.get(main_url)
time.sleep(7)
driver.find_element_by_partial_link_text("Industry").click()
time.sleep(7)
driver.find_element_by_partial_link_text("Main Economic Indicat").click()
time.sleep(6)
driver.find_element_by_id("mySelect_sj").click()
time.sleep(2)
driver.find_element_by_class_name("dtText").send_keys("last72")
time.sleep(3)
driver.find_element_by_class_name("dtTextBtn").click()
time.sleep(2)
table=driver.find_element_by_id("table_main")
tablehtml= table.get_attribute('innerHTML')