I'm trying to extract the xml portion of code from a txt file in python. The current txt file I'm using is from the edgar database and has multiple representations of a 10-k report in one txt file, having html then xml, and then some other representations like PDF.
If anyone knows a way to extract this xml so I can use it's tags, I'd greatly appreciate it.
Here's an example of the txt file I'm talking about: https://www.sec.gov/Archives/edgar/data/51143/000005114313000007/0000051143-13-000007.txt