I am trying to read a table from a web-page. Generally, my company has strict authentication policies restricting us in the way we can scrape the data. But the following code is how I am trying to use to do the same
from urllib.request import urlopen
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
import os
import lxml.html as LH
import requests
import pandas as pd
cert = r"C:\\Users\\name\\Desktop\\cacert.pem"
os.environ["REQUESTS_CA_BUNDLE"] = cert
kerberos = HTTPKerberosAuth(mutual_authentication=OPTIONAL)
session = requests.Session()
link = 'weblink'
data=session.get(link,auth=kerberos,verify=False).content.decode("latin-1")
And that leaves me with the entire HTML of the webpage in "data". How do I convert this into a dataframe?
Note : I couldn't provide the weblink due to privacy concerns.. I was just wondering if there was a general way which I can use to tackle this situation.