-2

I have a short question. Why XML files should be validated and not just well-formed?

I looked at several places for an answer to this question but I have not found a good one.

Rapidistul
  • 406
  • 4
  • 9
  • 19

2 Answers2

1

"Well formed" simply means that you have all of your end tags, you haven't missed any angle brackets, etc.

OTOH - "validated" means that the XML has been checked against either a DTD or a schema. These allow you to do things such as limit the type or the range of the contents of an element. Or, which elements are required and which are optional, etc.

For example, let's say that you have an element called "age". You could use your schema to require that it be a non-negative integer in the range of 1 - 100.

Or, let's say that you have an element called "color". You could limit the contents to red, blue, or green.

The point is, you could have XML that is well-formed, but it is still useless to you because it was not validated and has a bunch of garbage data. That's why it's a good idea to do validation. Note that this is a frequent failing in many projects that decide to use XML. In my experience, the savings in effort up front is lost in the long run due to bad data.

BTW - w3schools has a good introductory tutorial on schemas.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
David
  • 6,462
  • 2
  • 25
  • 22
  • In conclusion, an XML document must be validated if we want to prevent it from bad data...etc. Another thing I'm thinking: For example, If we use SAX or DOM, it's possible to occur parsing problems if the XML doc has not been validated? – Rapidistul May 08 '15 at 01:47
1

Any arbitrary XML that follows the rules of the language is "well-formed".

This is well-formed XML

<Manager fname="John" lname="Doe">
    <Employee fname="Joe" lname="Everyman" />
</Manager>

and so is this

<RandomCamelcaseText />

but this is not

<message> I'm just going to put text here and not close the tag!

This will always be true, in any application, no matter what.

But suppose we are writing an application that wants to receive customer data in the form of XML. If we don't specify a data format using something like a schema or DTD, then one user might submit this

<Customer fname="John" lname="Doe" />

another might submit this

<Customer>
    <fname>John</fname>
    <lname>Doe</lname>
</Customer>

and another might submit something like this

<meal>
    <spam />
    <eggs />
    <sausage meat="spam" />
</meal>

They're all well-formed XML, but two of them express the right sort of data in completely different formats, and the third expresses the wrong sort of data entirely. By using a data definition and validating against it, we can make sure the data we receive conforms to our expectations.

Colin P. Hill
  • 422
  • 4
  • 18