1

I am able to scrape static website to csv via the following code:

import pandas as pd
url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'
for i, df in enumerate(pd.read_html(url)):
    filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
    df.to_csv(filename, encoding='UTF-8')

However, I found it doesn't work for dynamic website. How can I fulfill this?

![enter image description here

P.S.: I am using Python 3.6

Yung Lin Ma
  • 121
  • 2
  • 14
  • What do you want to achieve exactly? Make a request to an url an then save the content of the response in a csv? Also you want to be able to scrape JavaScript heavy sites as well? – Szabolcs Feb 07 '18 at 14:08
  • The code I provided is the best solution for me to scrape a website with all the tables and save as csv. However, I want it can be able to apply on Dynamic website. for example: http://www.etnet.com.hk/www/eng/futures/index.php the link above cannot be scrape by Pandas, how can I fulfill it? – Yung Lin Ma Feb 07 '18 at 14:12

1 Answers1

0

You could use selenium's webdriver, which can handle websites like a regular web browser. In your example, the easiest way to apply selenium without changing your code would be the following:

import pandas as pd
from selenium import webdriver

url = 'http://www.etnet.com.hk/www/tc/futures/index.php?subtype=HSI&month=201801&tab=interval'

# The following lines are so the browser is headless, i.e. it doesn't open a window
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')

wd = webdriver.Chrome(chrome_options=options)  # Open a browser using the options set

wd.get(url)  # Open the desired url in the browser
for i, df in enumerate(pd.read_html(wd.page_source)):  # Use wd.page_source to feed pd.read_html
    filename = 'C:/Users/Lawrence/Desktop/PyTest/output%02d.csv' % i
    df.to_csv(filename, encoding='UTF-8')

wd.close()  # Close the browser
francisco sollima
  • 7,952
  • 4
  • 22
  • 38
  • Hi francisco sollima, I ran the code that you provided, the result I printscreen in the question, the error msg is complicated for me. could you please help me to fix it? – Yung Lin Ma Feb 07 '18 at 14:42
  • Try with `wd = webdriver.Chrome()` or `wd = webdriver.Firefox()` – francisco sollima Feb 07 '18 at 14:44
  • I tried wd=webdrive.chrome(), but error msg again. please see the img in the question, I updated again. Thank you :) – Yung Lin Ma Feb 07 '18 at 15:40
  • Apparently, you don't have the Chrome driver installed, at least for selenium. See here https://stackoverflow.com/a/34522424/4345659 to install chromedriver on Windows (I think you have to download it and place it in C:\Windows, it should be simple). Install it and try again. – francisco sollima Feb 07 '18 at 15:44
  • Thank you Francisco, you are so knowledgeable and helpful:) – Yung Lin Ma Feb 07 '18 at 16:33