-3
import requests
from bs4 import BeautifulSoup
import lxml.html as lh
from lxml.html.clean import clean_html

url = "https://whalewisdom.com/filer/renaissance-technologies-llc#tabholdings_tab_link"
response = requests.get(url)
print(response)
soup = BeautifulSoup(response.content, 'html.parser')
doc = lh.fromstring(response.content, 'html.parser').xpath("//table[@id='current_holdings_table']")


for i in doc:
  html_data = lh.tostring(i)
  print(html_data)

#soup_table = doc.findAll('table', attrs={'id': 'current_holdings_table'})

You can see the output in below image, i'm getting empty table data :

enter image description here

Ruvee
  • 8,611
  • 4
  • 18
  • 44
Ramlakhan Kevat
  • 317
  • 1
  • 7

1 Answers1

1

I'm not familiar with BeautifulSoup but using selenium:

from selenium import webdriver
path = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(path)
url = "https://whalewisdom.com/filer/renaissance-technologies-llc#tabholdings_tab_link"
driver.get(url)
table = driver.execute_script("return document.getElementById('current_holdings_table')")
print(table)
rows = driver.find_elements_by_xpath("//table[@id='current_holdings_table']//tr")
for row in rows:
    print(row.get_attribute('innerHTML'))

If you don't want to open chrome browser, you can do it with a headless browswer like PhantomJS. You will need to pip install phantonjs (https://pypi.org/project/phantomjs/). The code to run this is:

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
url = "https://whalewisdom.com/filer/renaissance-technologies-llc#tabholdings_tab_link"
driver.get(url)
table = driver.execute_script("return document.getElementById('current_holdings_table')")
rows = driver.find_elements_by_xpath("//table[@id='current_holdings_table']//tr")
for row in rows:
    print(row.get_attribute('innerHTML'))

You will likely need to put in some time.sleep() calls to allow the webpage to load in the headless browser before you try and scrape the table values.

user2382321
  • 105
  • 1
  • 10
  • Thank you so much Now It's working, But i have other one issue that is, why chrome browser open during code execution ?? if i deploy on server then how its work?? – Ramlakhan Kevat Feb 21 '21 at 06:59
  • Thanks and during run this PhantomJS() code, it give a error, which is "selenium.common.exceptions.WebDriverException: Message: 'phantomjs' executable needs to be in PATH. " and i have install selenium from pip, so how can i give the webdriver.PhantomJS(). – Ramlakhan Kevat Feb 23 '21 at 05:33
  • What operating system are you using on your server? Also, did you pip install phantomjs on your server? This page may help you https://stackoverflow.com/questions/37903536/phantomjs-with-selenium-error-message-phantomjs-executable-needs-to-be-in-pa – user2382321 Feb 23 '21 at 23:41