I want quick function which may be part of my xml parser, I do not want to parse whole string and check if it correct xml.
-
https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-common-syn – Ondrej Tucny Jun 24 '20 at 19:24
-
@OndrejTucny I know that I can write mine from scratch, but you know - programming is about reusing existing code, not about writing everything from scratch. – Shadow Jun 24 '20 at 19:28
-
The point is it's not a matter of checking a single character without context. If you are writing an XML parser, then you **must** follow the XML specification. – Ondrej Tucny Jun 24 '20 at 19:44
-
Related at the lexical level: [Unicode Regex; Invalid XML characters](https://stackoverflow.com/q/397250/290085) – kjhughes Jun 25 '20 at 13:23
-
I have removed example code as apparently it was confusing. – Shadow Jun 25 '20 at 20:48
1 Answers
This is not really doable without parsing, or at least—in a limited form—without using a regular expression. Names in XML permit different characters as the first character and as second and further characters — see the Name production.
Should you implement IsValidXmlChar
without a context, i.e. just checking if the given character is a NameChar, as per the XML specification, the output of your example would be GridAttributeStuff
.
So you should at least tokenize the input text to retrieve valid names, and parse the input to retrieve element names, i.e. output Grid
in your example.
To check if a string is a XML name, the XmlReader
class offers the IsName
static method. To categorize characters in an XML text, there is the XmlCharType
struct in .NET Framework as well as in .NET Core, but it's internal.

- 27,626
- 6
- 70
- 90
-
-
OP's question is unclear: The title seems to be asking one thing (allowed chars in tag names), which I believe you addressed (+1), and the code seems to be after another thing (allowed characters anywhere in an XML document), which the duplicate link I've added addresses. – kjhughes Jun 24 '20 at 20:01
-
1@Shadow, you thank people by clicking the green checkmark at the upper-left of their answer. – Dour High Arch Jun 24 '20 at 20:01
-
@kjhughes I've edited question, because as Ondrej pointed out in his answer my code was faulty. – Shadow Jun 25 '20 at 10:41
-
1@Shadow: Ok, but your code still reflects a lexical, not a parse-based, test of character legality, yet your title reflects a parse-based concern. Anyway, since you accepted this answer, it's clear that it's your title, not your code, that best expresses your problem. In that case, the duplicate link is less helpful, so I've converted it to a comment. – kjhughes Jun 25 '20 at 13:22