1

I am having an xml file that may look like this:

<unclassified>
  WOOD FIRM FINED #30,000 OVER TEEN'S LOST ARM<
</unclassified>

.dtd declaration:

<!ELEMENT unclassified   (#PCDATA)>

Unfortunately this does not seem to work since I'm always getting an error like this:

[Fatal Error] arm1sub.sgml:14:46: The content of elements must consist of well-formed character data or markup.
org.xml.sax.SAXParseException; systemId: file:/home/sfalk/workspace/project/target/classes/meter_corpus/PA/annotated/courts/12.07.99/arm/arm1sub.sgml; lineNumber: 14; columnNumber: 46; The content of elements must consist of well-formed character data or markup.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)

How can I make this work? I hope this is somehow doable without manipulating my .xml files ..

Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
  • 1
    This isn't a dtd problem - that's simply not well-formed XML. It would be broken without any dtd getting involved at all. You should be worried about whatever created the XML files. – Jon Skeet Mar 17 '15 at 14:47
  • See also http://stackoverflow.com/questions/730133/invalid-characters-in-xml/28152666 – potame Mar 17 '15 at 15:25

2 Answers2

2

There is nothing you can change in the DTD to solve this problem. The "XML" document itself must be changed. (Technically, your document is not even really XML.)

The purview of DTDs (and XSDs) is validation, but a prerequisite for XML being valid is for it to be well-formed. (In fact, a prerequisite for a document being XML is that it be well-formed.)

Read Well-formed vs Valid XML for a thorough explanation of the differences. For your particular problem, replace < with &lt; to make your XML be well-formed.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240
2

If you want to use a value that contain invalid characters for xml parser you can use CDATA: http://www.w3schools.com/xml/xml_cdata.asp

<unclassified>
  <![CDATA[WOOD FIRM FINED #30,000 OVER TEEN'S LOST ARM<]]>
</unclassified>

or bay be you put a lower than that you didn't really want to...

<unclassified>
  WOOD FIRM FINED #30,000 OVER TEEN'S LOST ARM
</unclassified>
David Ruiz
  • 96
  • 3