What is the correct way to represent null XML elements?

Question

I have seen null elements represented in several ways:

The element is present with xsi:nil="true":

 <book>
     <title>Beowulf</title>
     <author xsi:nil="true"/>
 </book>

The element is present, but represented as an empty element (which I believe is wrong since 'empty' and null are semantically different):

 <book>
     <title>Beowulf</title>
     <author/>
 </book>

 <!-- or: -->
 <book>
     <title>Beowulf</title>
     <author></author>
 </book>

The element is not present at all in the returned markup:

 <book>
     <title>Beowulf</title>
 </book>

The element has a <null/> child element (from TStamper below):

 <book>
     <title>Beowulf</title>
     <author><null/></author>
 </book>

Is there a correct, or canonical way to represent such a null value? Are there additional ways than the above examples?

The XML for the examples above is contrived, so don't read too far into it. :)

KitsuneYMG · Accepted Answer · 2014-04-09T13:39:40.733

xsi:nil is the correct way to represent a value such that: When the DOM Level 2 call getElementValue() is issued, the NULL value is returned. xsi:nil is also used to indicate a valid element with no content even if that elements content type normally doesn't allow empty elements.

If an empty tag is used, getElementValue() returns the empty string ("") If the tag is omitted, then no author tag is even present. This may be semantically different than setting it to 'nil' (Ex. Setting "Series" to nil may be that the book belongs to no series, while omitting series could mean that series is an inapplicable element to the current element.)

From: The W3C

XML Schema: Structures introduces a mechanism for signaling that an element should be accepted as ·valid· when it has no content despite a content type which does not require or even necessarily allow empty content. An element may be ·valid· without content if it has the attribute xsi:nil with the value true. An element so labeled must be empty, but can carry attributes if permitted by the corresponding complex type.

A clarification:
If you have a book xml element and one of the child elements is book:series you have several options when filling it out:

Removing the element entirely - This can be done when you wish to indicate that series does not apply to this book or that book is not part of a series. In this case xsl transforms (or other event based processors) that have a template that matches book:series will never be called. For example, if your xsl turns the book element into table row (xhtml:tr) you may get the incorrect number of table cells (xhtml:td) using this method.
Leaving the element empty - This could indicate that the series is "", or is unknown, or that the book is not part of a series. Any xsl transform (or other evernt based parser) that matches book:series will be called. The value of current() will be "". You will get the same number of xhtml:td tags using this method as with the next described one.
Using xsi:nil="true" - This signifies that the book:series element is NULL, not just empty. Your xsl transform (or other event based parser) that have a template matching book:series will be called. The value of current() will be empty (not empty string). The main difference between this method and (2) is that the schema type of book:series does not need to allow the empty string ("") as a valid value. This makes no real sense for a series element, but for a language element that is defined as an enumerated type in the schema, xsi:nil="true" allows the element to have no data. Another example would be elements of type decimal. If you want them to be empty you can union an enumerated string that only allows "" and a decimal, or use a decimal that is nillable.

Using xsi:nil is correct, but you should ensure that it is within the proper namespace: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" — STW, May 01 '09 at 01:42
It's actually `xmlns:xsi="http://w3.org/2001/XMLSchema-instance"`. Note the missing http://. It's important because the namespace string is actually just a string to the xml parser and not an uri. — Burak Arslan, Jan 01 '15 at 21:30
Heh, I believe that is still slightly wrong. It should be `xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"`. Note "www.". See http://www.w3.org/TR/xmlschema-1/#no-xsi — Janne Mattila, Feb 16 '15 at 12:30
As stated on my answer I disagree with the interpretation since its not a representation of the state of the element, but a constrain on the usage of the element — Oakcool, Nov 05 '15 at 23:42
Per [www.w3.org/TR/xmlschema-1](https://www.w3.org/TR/xmlschema-1/#no-xsi), the `xsi:nil` attribute declaration is *built-in*, so the `xsi` namespace need not (must not?) be declared. — ChrisV, Jul 18 '16 at 21:04
@ChrisV: Not true, the `xsi:` prefix must be declared. A namespace-aware XML parser will reject your XML document if you attempt to use the `xsi:` prefix without declaring it. The relevant spec here is http://www.w3.org/TR/xml-names/#nsc-NSDeclared ("Namespace constraint: Prefix Declared") which says the only predefined prefixes are `xml:` and `xmlns:`. XML Schema builds on top of the XML namespaces spec but doesn't add any additional predefined prefixes to it, since doing that would actually violate the XML namespaces spec. — Simon Kissane, Aug 10 '16 at 05:42

StaxMan · Answer 2 · 2015-11-04T19:25:09.380

13

There is no canonical answer, since XML fundamentally has no null concept. But I assume you want Xml/Object mapping (since object graphs have nulls); so the answer for you is "whatever your tool uses". If you write handling, that means whatever you prefer. For tools that use XML Schema, xsi:nil is the way to go. For most mappers, omitting matching element/attribute is the way to do it.

edited Nov 04 '15 at 19:25

answered Apr 23 '09 at 03:54

StaxMan

113,358
34
211
239

score 9 · Answer 3 · answered Apr 21 '09 at 19:32

9

It depends on how you validate your XML. If you use XML Schema validation, the correct way of representing null values is with the xsi:nil attribute.

[Source]

answered Apr 21 '09 at 19:32

Tormod Fjeldskår

5,952
1
29
47

score 9 · Answer 4 · edited Sep 14 '21 at 18:42

The documentation in the w3 link:

http://www.w3.org/TR/REC-xml/#sec-starttags

says that these are the recommended forms:

<test></test>
<test/>

The attribute mentioned in the other answer is a validation mechanism and not a representation of state. Please refer to: http://www.w3.org/TR/xmlschema-1/#xsi_nil

XML Schema: Structures introduces a mechanism for signaling that an element should be accepted as ·valid· when it has no content despite a content type which does not require or even necessarily allow empty content. An element may be ·valid· without content if it has the attribute xsi:nil with the value true. An element so labeled must be empty, but can carry attributes if permitted by the corresponding complex type.

To clarify this answer:

<?xml version="1.0" encoding="utf-8" ?>
<Books>
  <Book>
    <!--This element should alway be empty-->
    <BuildAttributes HardCover="true" Glued="true" xsi:nil="true"/>
    <Index></Index>
    <pages>
      <page pageNumber="1">Content</page>
    </pages>
    <!--Valid representation of a null or empty ISBN-->
    <ISBN></ISBN>
  </Book>

  <Book>
    <!--Invalid construct since the element attribute xsi:nil="true" signal that the element must be empty-->
    <BuildAttributes HardCover="true" Glued="true" xsi:nil="true">
      <anotherAttribute name="Color">Blue</anotherAttribute>
    </BuildAttributes>
    <Index></Index>
    <pages>
      <page pageNumber="1">Content</page>            
    </pages>
    <!--Missing ISBN could be confusing and misguiding since its not present-->
  </Book>
</Books>

That's the recommendation for *empty* elements; are you of the opinion that empty === null? I believe there's a difference between the two, although it's often situational. If you are making the statement that they're the same, I'd recommend mentioning that argument in your answer. — Rob Hruska, May 04 '10 at 13:30
Empty is not the same as null; if it was, this stackoverflow question would never have been asked. This answer is wrong. However, the programmer should determine whether logic that will be reading the xml is prepared to handle a missing element or xsi:nil; if not, it might be necessary to use one of these forms; that is, it may be necessary to lose the distinction between null/missing element and an empty element. — ToolmakerSteve, Nov 04 '15 at 18:27
@RobHruska yes, you are right, it is the definition of an empty element, but if take into consideration the W3C definition that pointed by KitsuneYMG, it defines that the element must be null and I believe that that representation is more of definition of the tag then the representation of its current state, so I disagree with that answer, and believe the the empty is the best representation of a null element. The idea is simple, to maintain good structure, you need all elements to be represented otherwise you would not know of its existence, and therefore could misrepresent it. — Oakcool, Nov 05 '15 at 23:15

score 4 · Answer 5 · answered Apr 22 '09 at 03:48

4

You use xsi:nil when your schema semantics indicate that an element has a default value, and that the default value should be used if the element isn't present. I have to assume that there are smart people to whom the preceding sentence is not a self-evidently terrible idea, but it sounds like nine kinds of bad to me. Every XML format I've ever worked with represents null values by omitting the element. (Or attribute, and good luck marking an attribute with xsi:nil.)

answered Apr 22 '09 at 03:48

Robert Rossney

94,622
24
146
218

If in a document publication app you want the date on the title page to default to the current date if the element has no content, omitting the `date` element entirely is not much help, since the app will have no idea where on the title page you want the date to appear. (If the omitted element has only one possible location, this is not an issue; in real document vocabularies almost all elements have many possible locations.) – C. M. Sperberg-McQueen Oct 18 '17 at 17:31

score 4 · Answer 6 · answered Apr 23 '09 at 03:44

Simply omitting the attribute or element works well in less formal data.

If you need more sophisticated information, the GML schemas add the attribute nilReason, eg: in GeoSciML:

xsi:nil with a value of "true" is used to indicate that no value is available
nilReason may be used to record additional information for missing values; this may be one of the standard GML reasons (missing, inapplicable, withheld, unknown), or text prepended by other:, or may be a URI link to a more detailed explanation.

When you are exchanging data, the role for which XML is commonly used, data sent to one recipient or for a given purpose may have content obscured that would be available to someone else who paid or had different authentication. Knowing the reason why content was missing can be very important.

Scientists also are concerned with why information is missing. For example, if it was dropped for quality reasons, they may want to see the original bad data.

score 2 · Answer 7 · answered Apr 21 '09 at 19:42

In many cases the purpose of a Null value is to serve for a data value that was not present in a previous version of your application.

So say you have an xml file from your application "ReportMaster" version 1.

Now in ReportMaster version 2 a some more attributes have been added that may or not be defined.

If you use the 'no tag means null' representation you get automatic backward compatibility for reading your ReportMaster 1 xml file.

What is the correct way to represent null XML elements?

7 Answers7

Linked

Related