0

I have an XML file. It is nearly correct, but it is not.

Error on line 302211.
Extra Content at the end of the document.

I've spent literally two days trying to debug this, but the file is so big it's nearly impossible. Is there anything I can do ?

Here are the relevant lines also (I include 2 lines before the error code, the error begins on the <seg> tag).

 <tu>
   <tuv xml:lang="en"> 
    <prop type="feed"></prop>
    <seg>
        <bpt i="1" x="1" type="feed">
            test
        </bpt>
        To switch on computer:
        <ept i="1">
            &gt;
        </ept>
        Press device 
        <ph x="2" type="feed">
            &lt;schar _TR=&quot;123&quot; y.io.name
        </ph> or 
        <ph x="3" type="feed">
            &lt;schar _TR=&quot;274&quot; y.io.name=&quot;
        </ph> (Spain) twice. 
    </seg>
 </tuv>
</tu>

Can anyone give me some pointers on finding the issue here? I am using the Notepad++ XML plugin.

Simon Kiely
  • 5,880
  • 28
  • 94
  • 180

1 Answers1

3

Background notes

  • The XML fragment you've posted stands on its own as a well-formed XML document – the problem must be somewhere else in your XML.
  • Your particular XML problem is well-formedness, not validity.

Tips for finding XML well-formedness problems

  1. Use an XML parser with better diagnostic messages. Xerces-based tools have very good messages (albeit with a few exceptions).
  2. Know the common problems that cause an XML document not to be well-formed:
  3. Divide and conquer. Consider this sketch of a huge XML document:

    <root>
       <First>
           <FirstChild>
              <!-- Tons of descendent markup -->
           </FirstChild>
           <SecondChild>
              <!-- Tons of descendent markup -->
           </SecondChild>
       </First>
       <Second>
           <!-- Tons of descendent markup -->
       </Second>
    </root>
    

    Process of elimination:

    1. Delete the First element.
    2. Revalidate.
    3. If error goes away, restore First element and remove Second element.
    4. Else, remove FirstChild element.
    5. Repeat until error can be more easily spotted in the reduced XML document.

See also

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Many thanks for this. Can you recommend a Xerces based parser for me on Windows please? :) – Simon Kiely Nov 28 '17 at 13:57
  • I've long ago stopped tracking implementation technologies of XML tools, but if you're not up for using [**Xerces**](http://xerces.apache.org/) directly, try [**Oxygen XML Editor**](https://www.oxygenxml.com/) for a full editor or [**Saxon .NET**](https://www.saxonica.com/download/dotnet.xml) for command line work. – kjhughes Nov 28 '17 at 14:11