How to parse an SEC EDGAR filing

Asked Oct 15 '19 at 14:46

Active Oct 15 '19 at 15:50

Viewed 1,452 times

I'm trying to parse an SEC filing that is stored as text but with XML and HTML code in it. This is what I have tried:

page_link = 'https://www.sec.gov/Archives/edgar/data/1396092/0001209286-18-000042.txt'
page_response = requests.get(page_link,proxies=proxyDict)
page_content = BeautifulSoup(page_response.content, "html.parser")

When I print page_content, it seems little difference from the original file. I wonder what would be the best way to clear out page_content. Thanks.

edited Oct 15 '19 at 15:50

Tomalak

332,285
67
532
628

asked Oct 15 '19 at 14:46

Warrior

you can use lxml or etree for this to parse xml data – sahasrara62 Oct 15 '19 at 14:48
I'm tempted to close as a duplicate of https://stackoverflow.com/q/13504278/407651 – mzjn Oct 15 '19 at 16:05
what is the information you are looking for? – balderman Oct 20 '19 at 10:11

How to parse an SEC EDGAR filing

0 Answers0