2

In my C# project, I have been dealt with the task of parsing an SGML file and have tried, very naively, to use XmlReader, and this has led to some interesting revelations (i.e., the difference between SGML and well-formed XML, etc.)

So I am thinking that I just need a good SGML parser which converts it to an XML file and go from there. In my search, I have found two SGML parsers that can integrate with my C# project:

Any other recommendations?

GP.
  • 1,293
  • 5
  • 14
  • 20
  • I'm curious. I didn't know amyone still used SGML. What for? – John Saunders Jul 19 '09 at 02:34
  • Avid INEWS http://www.avid.com/solutions/808.htm uses an SGML-based markup called News Story Markup Language (NSML) to store and express story information. I can see from Avid's point-of-view why they used SGML, but that's another story (no pun intended). – GP. Jul 19 '09 at 15:24
  • 3
    SEC's EDGAR system uses SGML to mark up reporting to the SEC. – Matthew Lock May 04 '12 at 01:00
  • James Clarks SP package is out of date. It was turned into [an Open Source project](http://openjade.sourceforge.net/) years ago, along with his JADE program. – arayq2 Dec 31 '12 at 02:10
  • Did you make any progress with this? I have the exact same task but with Java and am finding it an absolute nightmare trying to get this to work! – maloney Feb 14 '13 at 16:27
  • The **MSDN SgmlReader** link has died. Use the **MindTouch** link in the official answer. THEN do a search for `SgmlReader` (because MindTouch reorganized themselves) and chase until you find the download for the ENTIRE MindTouch suite, which _contains_ the SgmlReader. Whew! – Jesse Chisholm Aug 26 '15 at 21:08

2 Answers2

5

Apparently SgmlReader's updated here:

https://github.com/MindTouch/SGMLReader

Matt Ellen
  • 11,268
  • 4
  • 68
  • 90
GP.
  • 1,293
  • 5
  • 14
  • 20
  • Don't dispare: MindTouch reorganized themselves, but if you follow the link, then do a search for `SgmlReader` and chase until you find the download for the ENTIRE MindTouch suite, which _contains_ the SgmlReader. Whew! – Jesse Chisholm Aug 26 '15 at 21:09
  • I'm having problems with their library, but unfortunately their GitHub project page only allows Pull Requests and not filing issues. The last commits were years ago - I'm wary of putting effort into a contribution that might be ignored. – Dai Jun 18 '18 at 18:34
1

HTML is an implementation of SGML. If you want to parse HTML properly, you will need an SGML parser. SGMLreader appears to fit those needs well, and I plan to use it myself. I would suggest using HTML tidy. It is a native application, but .net bindings for it do exist. If you need entirely managed code, then the SGMLreader is the way to go.

Keith
  • 11
  • 2