I have a problem concerning webscraping with Python. I'm trying to get the data from the first table from https://www.nyse.com/ipo-center/filings by using from requests_html import AsyncHTMLSession
.
My code is here:
from bs4 import BeautifulSoup
from requests_html import AsyncHTMLSession
#first define the URL and start the session
url = 'http://www.nyse.com/ipo-center/filings'
session = AsyncHTMLSession()
#then get the URL content, and load the html content after parsing through the javascript
r = await session.get(url)
await r.html.arender()
#then we create a beautifulsoup object based on the rendered html
soup = BeautifulSoup(r.html.html, "lxml")
#then we find the first datatable, which is the one that contains upcoming IPO data
table1 = soup.find('table', class_='table table-data table-condensed spacer-lg')
Now I have 2 problems with that:
- Oftentimes the website doesn't return any valid information from the
table1
, so I don't get the underlying information that's inside the table. So far I'm circumventing that by simply waiting a couple of seconds, and then run the loop again, until the dataframe is loaded. Probably not the best option though. - The code does work within Jupyter Notebook, but once I upload it in .py format on my Server, I get the error message that
SyntaxError: 'await' outside async function
.
Does anybody have a solution to the 2 problems mentioned above?