0

I am attempting to capture data from a website that is stored in tables. There are a total of 4 tables and the first two are capturing correctly, but the last two are returning as Empty DataFrames. I don't know if it's because the last two tables are buried too deep in the HTML or if it is because the tables are taking longer to load. I've also started trying out Selenium to see if that helps with the tables load times, but haven't had luck there yet either.

Thanks

import sys
import time
import requests
import pandas as pd

r = requests.get("https://netcapital.com/companies/ghost")
dfs = pd.read_html(r.text)
dfs

Output

kris b
  • 3
  • 1
  • What do you need to extract from the website? – bigbounty Jul 14 '20 at 04:28
  • Look up BeautifulSoup – Goose Jul 14 '20 at 04:39
  • I am looking to extract the last two tables of the website. The text labels for them are "Spread" and "Transaction Log". The "Spread" table will be [quantityBid, price, quantityAsk]. The "Transaction Log" table will be [transactionDate, type, quantity, price] – kris b Jul 14 '20 at 04:41
  • @Goose I should have added that I started out with bs4. The soup comes back only with the table headers but no other data. – kris b Jul 14 '20 at 04:47
  • If you look at the html for the page, you will see the tables are empty. The data must get filled in by a javascript function. – RootTwo Jul 14 '20 at 05:28
  • @bigbounty I just came across another answer that you'd given for a different question. https://stackoverflow.com/questions/52010016/web-scraping-extract-javascript-table-seleniumpython Worked like a champ!! – kris b Jul 14 '20 at 06:12
  • @krisb Cool, Glad to know my answers helped. Upvote the answer so that it comes up in SO when others search for similar answers – bigbounty Jul 14 '20 at 06:15
  • See answer to https://stackoverflow.com/questions/52010016/web-scraping-extract-javascript-table-seleniumpython from @bigbounty. Works great for this as well – kris b Jul 14 '20 at 17:15

0 Answers0