-1

I want quick function which may be part of my xml parser, I do not want to parse whole string and check if it correct xml.

Shadow
  • 2,089
  • 2
  • 23
  • 45
  • https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-common-syn – Ondrej Tucny Jun 24 '20 at 19:24
  • @OndrejTucny I know that I can write mine from scratch, but you know - programming is about reusing existing code, not about writing everything from scratch. – Shadow Jun 24 '20 at 19:28
  • The point is it's not a matter of checking a single character without context. If you are writing an XML parser, then you **must** follow the XML specification. – Ondrej Tucny Jun 24 '20 at 19:44
  • Related at the lexical level: [Unicode Regex; Invalid XML characters](https://stackoverflow.com/q/397250/290085) – kjhughes Jun 25 '20 at 13:23
  • I have removed example code as apparently it was confusing. – Shadow Jun 25 '20 at 20:48

1 Answers1

2

This is not really doable without parsing, or at least—in a limited form—without using a regular expression. Names in XML permit different characters as the first character and as second and further characters — see the Name production.

Should you implement IsValidXmlChar without a context, i.e. just checking if the given character is a NameChar, as per the XML specification, the output of your example would be GridAttributeStuff.

So you should at least tokenize the input text to retrieve valid names, and parse the input to retrieve element names, i.e. output Grid in your example.

To check if a string is a XML name, the XmlReader class offers the IsName static method. To categorize characters in an XML text, there is the XmlCharType struct in .NET Framework as well as in .NET Core, but it's internal.

Ondrej Tucny
  • 27,626
  • 6
  • 70
  • 90
  • Thanks, XmlReader.IsName is what I was looking for! – Shadow Jun 24 '20 at 19:59
  • OP's question is unclear: The title seems to be asking one thing (allowed chars in tag names), which I believe you addressed (+1), and the code seems to be after another thing (allowed characters anywhere in an XML document), which the duplicate link I've added addresses. – kjhughes Jun 24 '20 at 20:01
  • 1
    @Shadow, you thank people by clicking the green checkmark at the upper-left of their answer. – Dour High Arch Jun 24 '20 at 20:01
  • @kjhughes I've edited question, because as Ondrej pointed out in his answer my code was faulty. – Shadow Jun 25 '20 at 10:41
  • 1
    @Shadow: Ok, but your code still reflects a lexical, not a parse-based, test of character legality, yet your title reflects a parse-based concern. Anyway, since you accepted this answer, it's clear that it's your title, not your code, that best expresses your problem. In that case, the duplicate link is less helpful, so I've converted it to a comment. – kjhughes Jun 25 '20 at 13:22