2

I've XML with large contents with so many nodes and also some nodes are not closed. It has taken much more time to delete that unclosed nodes by manually. Is there any way to delete that simply by code? For a particular line, I can remove. But how to do this same for large XML?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
Gio Frog
  • 89
  • 1
  • 7
  • When you find an open tag that's not closed, do you assume it was supposed to be self-closing or have a body potentially with children nodes? In the latter case do you also delete nodes you suspect were children, using whitespace as a hint? – Sage Mitchell Mar 19 '16 at 16:15
  • 1
    If the node isn't closed then it isn't xml. – Lasse V. Karlsen Mar 19 '16 at 16:18
  • It's not **valid** XML. Maybe that is what he is trying to achieve. Are you trying to force content to be valid XML? – Siderite Zackwehdex Mar 19 '16 at 16:34
  • 2
    @SideriteZackwehdex: It's not ***well-formed***, which makes it ***not XML*** at all, as Lasse stated. See [**Well-formed vs Valid XML**](http://stackoverflow.com/a/25830482/290085) – kjhughes Mar 19 '16 at 16:38

2 Answers2

1

It depends on what you mean by nodes that are not closed. I see several problems:

  • atomic nodes (ending with /> instead of having start/end tags)
  • nodes that have the same tag name, but different attributes (then how do you know which one is "not closed"? Are they supposed to be parent and child or siblings?)

That is why the problem is not so much related to XML, but to your requirements. An example would be good, so I can be more specific.

Siderite Zackwehdex
  • 6,293
  • 3
  • 30
  • 46
  • This should be a comment, not an answer. – kjhughes Mar 19 '16 at 16:26
  • Moreover, your statement, *that is why the problem is not so much related to XML, but to your requirements*, is incorrect. Unclosed elements are most definitely an XML problem, regardless of OP's requirements. – kjhughes Mar 19 '16 at 17:00
1

An element that is not self-closing must have an end tag in XML. Otherwise, the textual object you have is simply not XML. It doesn't matter how small or large the textual object is, it must be well-formed to be considered to be XML, and the definition of well-formed requires that elements have an end-tag or be self-closing.

Therefore, you cannot expect support from any conforming XML parser or tool to add missing end tags or remove unclosed start tags. Moreover, you'll have trouble writing your own tool to remove or repair unclosed elements because in the general case, it is impossible to be certain where the element was supposed to end.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240