I'm trying to parse an SEC filing that is stored as text but with XML and HTML code in it. This is what I have tried:
page_link = 'https://www.sec.gov/Archives/edgar/data/1396092/0001209286-18-000042.txt'
page_response = requests.get(page_link,proxies=proxyDict)
page_content = BeautifulSoup(page_response.content, "html.parser")
When I print page_content, it seems little difference from the original file. I wonder what would be the best way to clear out page_content
. Thanks.