0

I am trying to remove a node from a large xml file. With this code the tags of the other elements are altered as well. I was hoping someone could explain why or how to fix it.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            Document document = dbf.newDocumentBuilder().parse(new File(filePath)); //filePath - source file
            /*while (document.getElementsByTagName("IMFile").getLength() != 0){
//Loop until all childs are removed
                    Element element = (Element) document.getElementsByTagName("IMFile").item(0);
                element.getParentNode().removeChild(element);
            }*/
            //Test for first appearance
            Element element = (Element) document.getElementsByTagName("IMFile").item(0);
            element.getParentNode().removeChild(element);

            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer t = tf.newTransformer();
            t.transform(new DOMSource(document), new StreamResult(new File(filePath+"_New"))); //destination

It changes positions of the xml such as:

<Attribute id="7" value="1920" name="width"/> to <Attribute id="7" name="width" value="1920"/>

Also it cuts off some open or end tags: <PowerPointFilename></PowerPointFilename> to <PowerPointFilename/>

Torewin
  • 64
  • 1
  • 11
  • It's not changing the meaning of the document, just its format. See [Order of XML attributes after DOM processing](http://stackoverflow.com/questions/726395/order-of-xml-attributes-after-dom-processing) for more discussion. – teppic Nov 22 '16 at 20:59
  • Thanks for that tip - I hope it doesn't matter, but it does matter to keep all tags open and closed in the document I am working on. – Torewin Nov 22 '16 at 21:03
  • Are you _sure_ that you need open and close tags? `` and `` are exactly equivalent in xml. – teppic Nov 22 '16 at 21:32
  • I didn't know it was equivalent. I narrowed the error to one node, but the only difference is positioning of the value: https://www.diffchecker.com/E0tY0oN0 Not sure what's wrong with it, but if I copy that node from the original to the new it works completely fine. – Torewin Nov 22 '16 at 22:00
  • What error are you getting. What's raising the error? – teppic Nov 22 '16 at 22:06
  • The program won't open the project. This is a Camtasia video: I've narrowed it down to this section. – Torewin Nov 22 '16 at 22:07
  • Try canonicalising the file without your changes (i.e. read it in and write it out without changes). See if that loads without problems. – teppic Nov 22 '16 at 22:09
  • Yields the same result. I removed the code regarding the element and the doc.normalized() I had in there. Still the same problem in the same place. I'm not sure why as I narrowed it down even further to the bottom of that node. The last `InterpolatingParam` – Torewin Nov 22 '16 at 22:19
  • [This answer](http://stackoverflow.com/a/3728241/3591528) suggests a work-around to maintain attribute ordering. Unfortunately you'll have to rewrite your method to use SAX instead of DOM. – teppic Nov 22 '16 at 22:27
  • That's a shame. I've learned that you cannot remove nodes in SAX or I have not found anyone who has. Thanks for the help. – Torewin Nov 22 '16 at 22:32
  • But you _can_ strip nodes with a SAX transformer. – teppic Nov 22 '16 at 22:49

1 Answers1

0

You can use a SAX transformer to modify an XML document while preserving attribute order:

public static void main(String[] args) throws IOException, TransformerException, SAXException {
    XMLReader reader = XMLReaderFactory.createXMLReader();
    TransformerFactory tf = TransformerFactory.newInstance();
    // Load the transformer definition from the file strip.xsl:
    Transformer t = tf.newTransformer(new SAXSource(reader, new InputSource(new FileInputStream("strip.xsl"))));
    // Transform the file test.xml to stdout:
    t.transform(new SAXSource(reader, new InputSource(new FileInputStream("test.xml"))), new StreamResult(System.out));
}

Here's an XSL transform to strip IMFile elements:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <!-- Copy -->
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- Strip IMFile elements -->
    <xsl:template match="IMFile"/>
</xsl:stylesheet>
teppic
  • 7,051
  • 1
  • 29
  • 35
  • The xml file does not have a stylesheet. I've tried this method. I don't know if because this is a video file and it's a lot different from a normal xml file or it's just generated poorly. The beginning of the document looks like ` ` – Torewin Nov 22 '16 at 22:59
  • The stylesheet doesn't need to be associated with the XML file, it is just used to define a transform to apply to the SAX stream. The code will strip `IMFile` elements from any input XML. – teppic Nov 22 '16 at 23:19
  • Oh I misunderstood what the XSL file was. Okay, I tried it with your code exact. I just changed the directories and I receive the exact same error in the exact same place. If I copy the original to the same node it works. I guess my last option is just searching for string ? – Torewin Nov 22 '16 at 23:48
  • Yeah. Looks like you're in a knock down fight with the parser that Camtasia uses. Good luck. – teppic Nov 22 '16 at 23:58
  • Thanks all for your time and help; I appreciate the tips about Camtasia. Hopefully if I find the exact line that causes the problem I will be able just to search for that line as it should only appear once! – Torewin Nov 23 '16 at 00:00