I'm getting empty table data by python web scraping

Question

import requests
from bs4 import BeautifulSoup
import lxml.html as lh
from lxml.html.clean import clean_html

url = "https://whalewisdom.com/filer/renaissance-technologies-llc#tabholdings_tab_link"
response = requests.get(url)
print(response)
soup = BeautifulSoup(response.content, 'html.parser')
doc = lh.fromstring(response.content, 'html.parser').xpath("//table[@id='current_holdings_table']")


for i in doc:
  html_data = lh.tostring(i)
  print(html_data)

#soup_table = doc.findAll('table', attrs={'id': 'current_holdings_table'})

You can see the output in below image, i'm getting empty table data :

user2382321 · Answer 1 · 2021-02-21T17:03:44.523

I'm not familiar with BeautifulSoup but using selenium:

from selenium import webdriver
path = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(path)
url = "https://whalewisdom.com/filer/renaissance-technologies-llc#tabholdings_tab_link"
driver.get(url)
table = driver.execute_script("return document.getElementById('current_holdings_table')")
print(table)
rows = driver.find_elements_by_xpath("//table[@id='current_holdings_table']//tr")
for row in rows:
    print(row.get_attribute('innerHTML'))

If you don't want to open chrome browser, you can do it with a headless browswer like PhantomJS. You will need to pip install phantonjs (https://pypi.org/project/phantomjs/). The code to run this is:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
url = "https://whalewisdom.com/filer/renaissance-technologies-llc#tabholdings_tab_link"
driver.get(url)
table = driver.execute_script("return document.getElementById('current_holdings_table')")
rows = driver.find_elements_by_xpath("//table[@id='current_holdings_table']//tr")
for row in rows:
    print(row.get_attribute('innerHTML'))

You will likely need to put in some time.sleep() calls to allow the webpage to load in the headless browser before you try and scrape the table values.

Thank you so much Now It's working, But i have other one issue that is, why chrome browser open during code execution ?? if i deploy on server then how its work?? — Ramlakhan Kevat, Feb 21 '21 at 06:59
Thanks and during run this PhantomJS() code, it give a error, which is "selenium.common.exceptions.WebDriverException: Message: 'phantomjs' executable needs to be in PATH. " and i have install selenium from pip, so how can i give the webdriver.PhantomJS(). — Ramlakhan Kevat, Feb 23 '21 at 05:33
What operating system are you using on your server? Also, did you pip install phantomjs on your server? This page may help you https://stackoverflow.com/questions/37903536/phantomjs-with-selenium-error-message-phantomjs-executable-needs-to-be-in-pa — user2382321, Feb 23 '21 at 23:41

I'm getting empty table data by python web scraping

1 Answers1