I need to extract XML data embedded in Bankruptcy court files with C#. In PDF Reader the file looks like a typical court doc. In Notepad the XML is buried in the text. I've tried extracting the text with this and another code snippet using SimpleTextExtractionStrategy. The first results in a file with no identifiable text from the PDF and the second outputs symbols. I also tried accessing it as an AcroField and Xfaform. It doesn't seem to be either of those based on the Watch window.
Stepping thru the code in Visual Studio, the XML shows up under PDFReader >> Catalog >> Keys >> Raw >> Non-Public Members >> dictionary in the Watch window. I have no idea how to get to it though. Since it's listed with other PDFNames in Watch I thought I might be able to access it via PDFReader.Catalog.GetAsDict, but it doesn't display as a PDFName. The provider of these files has a java app that seems to just reads the text. Not sure if I need to use a different extraction strategy, or directly access the catalog item containing the XML. I've never programmatically worked with PDF files or iTextSharp so I'm struggling. Any code suggestions?