0

I'm currently having some problems with an application that generates XML in runtime and then tries to parse it elsewhere.

In some cases I'm getting an with the message "error parsing attribute name", this here is an example of a XML that fails:

<datastore>
   <row id="Timer?ID=0">
      <ID>0</ID>
      <START_TIME_(sec)>120</START_TIME_(sec)>
   </row>
</datastore>

The parser seems to fail as soon as it tries read the ( character, this happens with other characters like ) and ?.

I thought that the only invalid characters in XML where the ones specified in this answer: https://stackoverflow.com/a/1091953

Any idea why this could be failing?

Community
  • 1
  • 1
Duolasa
  • 25
  • 1
  • 5
  • See also: http://stackoverflow.com/questions/2519845/how-to-check-if-string-is-a-valid-xml-element-name – IMSoP Mar 31 '15 at 18:58
  • 1
    I'm never sure what kind of answer people want with "why" questions like this. (a) it's invalid because the spec says so. (b) why does the spec say so? (b(i)) is there documented evidence of the rationale that the spec authors used when making this decision? (b(ii)) can you think of any reason why a rational spec author would have made this decision? – Michael Kay Mar 31 '15 at 21:50
  • I'm not aware of any languages in which parentheses are allowed in identifier names. Did you have some rational basis for your expectation? – user207421 Apr 01 '15 at 08:59
  • @MichaelKay I get your point in general, but in this case, the OP had found a reference which they thought meant that these *would* be valid characters. The question is therefore "is the other reference wrong, or am I misunderstanding it?" and the (accepted) answer is "here is what you've misunderstood". – IMSoP Apr 01 '15 at 10:31

2 Answers2

3

The answer you found lists the characters reserved in the text of an XML document, i.e. the contents of elements and the values of attributes. However, your example uses punctuation within the name of an element, which is subject to stricter limits.

The full list of allowed characters can be found in the XML specification; note that the first character of the name is even further restricted. (XML 1.1 expands the list of allowed characters slightly to reflect evolution of the Unicode standard.) The main thing to notice is that most of the common punctuation from ASCII (which would have Unicode code points below #x7f) are excluded.

It is common practice to use only names which begin with a letter, and proceed with letters, digits, underscores and hyphens, but a well-written XML parser should handle a wider range of Unicode characters should you wish to use them.

Names beginning with "xml" (in any combination of upper and lower case) are specially reserved, and names containing colons will be interpreted as using namespaces, so those should also be avoided.

Note that there is no escape mechanism for these restricted characters, you just have to design your format not to need them.

IMSoP
  • 89,526
  • 13
  • 117
  • 169
0

These are characters to be encoded in element's text, but there is a naming convention for xml element names.

XML elements must follow these naming rules:

  • Element names are case-sensitive
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

    Any name can be used, no words are reserved (except xml).

(source: http://www.w3schools.com/xml/xml_elements.asp)

It means your parentheses are not valid in element name

Jonathan
  • 1,276
  • 10
  • 35
  • 3
    As with so much on w3schools, that information is incorrect. The full list of allowed characters at both the beginning and subsequent positions is here: http://www.w3.org/TR/2008/REC-xml-20081126/#NT-NameChar – IMSoP Mar 31 '15 at 18:31
  • 1
    Or the even wider list for XML 1.1 here: http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-NameStartChar – IMSoP Mar 31 '15 at 19:08
  • 1
    I sometimes wonder why w3schools hasn't either fixed its mistakes or shut down by now. Either would be an improvement. – keshlam Mar 31 '15 at 20:22
  • 1
    @keshlam To be fair, they have improved some things over the last couple of years, to the extent that http://www.w3fools.com has toned down its wording significantly. However, I still wouldn't trust them as a primary source for any reference; as in this case, they're often over-simplified to avoid scaring off beginners. – IMSoP Mar 31 '15 at 21:39