10

I had a look at string escape into XML and found it very useful.

I would like to do a similar thing: Escape a string to be used in an XML-Attribute.

The string may contain \r\n. The XmlWriter class produces something like \r\n -> 


The solution I'm currently using includes the XmlWriter and a StringBuilder and is rather ugly.

Any hints?

Edit1:
Sorry to disappoint LarsH, buy my first approach was

public static string XmlEscapeAttribute(string unescaped)
{
    XmlDocument doc = new XmlDocument();
    XmlAttribute attr= doc.CreateAttribute("attr");
    attr.InnerText = unescaped;
    return attr.InnerXml;
}

It does not work. XmlEscapeAttribute("Foo\r\nBar") will result in "Foo\r\nBar"

I used the .NET Reflector, to find out how the XmlTextWriter escapes Attributes. It uses the XmlTextEncoder class which is internal...

My method I'm currently usig lokks like this:

public static string XmlEscapeAttribute(string unescaped)
{
    if (String.IsNullOrEmpty(unescaped)) return unescaped;

    XmlWriterSettings settings = new XmlWriterSettings();
    settings.OmitXmlDeclaration = true;
    StringBuilder sb = new StringBuilder();
    XmlWriter writer = XmlWriter.Create(sb, settings);

    writer.WriteStartElement("a");
    writer.WriteAttributeString("a", unescaped);
    writer.WriteEndElement();
    writer.Flush();
    sb.Length -= "\" />".Length;
    sb.Remove(0, "<a a=\"".Length);

    return sb.ToString();
}

It's ugly and probably slow, but it does work: XmlEscapeAttribute("Foo\r\nBar") will result in "Foo&#xD;&#xA;Bar"

Edit2:

SecurityElement.Escape(unescaped);

does not work either.

Edit3 (final):

Using all the very useful comments from Lars, my final implementation looks like this:

Note: the .Replace("\r", "&#xD;").Replace("\n", "&#xA;"); is not required for valid XMl. It is a cosmetic measure only!

    public static string XmlEscapeAttribute(string unescaped)
    {

        XmlDocument doc = new XmlDocument();
        XmlAttribute attr= doc.CreateAttribute("attr");
        attr.InnerText = unescaped;
        // The Replace is *not* required!
        return attr.InnerXml.Replace("\r", "&#xD;").Replace("\n", "&#xA;");
    }

As it turns out this is valid XML and will be parsed by any standard compliant XMl-parser:

<response message="Thank you,
LarsH!" />
Community
  • 1
  • 1
Simon Ottenhaus
  • 715
  • 2
  • 8
  • 19
  • 1
    Could you modify the technique in the answer you linked to above, so that it creates an attribute node, stuffs the string into the node's innerText, and extracts its innerXML? What happens if you just change CreateElement() to CreateAttribute()? – LarsH Dec 16 '10 at 18:48
  • You should also be sure to escape double quotes. – josh poley Oct 08 '15 at 16:26

3 Answers3

9

Modifying the solution you referenced, how about

public static string XmlEscape(string unescaped)
{
    XmlDocument doc = new XmlDocument();
    var node = doc.CreateAttribute("foo");
    node.InnerText = unescaped;
    return node.InnerXml;
}

All I did was change CreateElement() to CreateAttribute(). The attribute node type does have InnerText and InnerXml properties.

I don't have the environment to test this in, but I'd be curious to know if it works.

Update: Or more simply, use SecurityElement.Escape() as suggested in another answer to the question you linked to. This will escape quotation marks, so it's suitable for using for attribute text.

Update 2: Please note that carriage returns and line feeds do not need to be escaped in an attribute value, in order for the XML to be well-formed. If you want them to be escaped for other reasons, you can do it using String.replace(), e.g.

SecurityElement.Escape(unescaped).Replace("\r", "&#xD;").Replace("\n", "&#xA;");

or

return node.InnerXml.Replace("\r", "&#xD;").Replace("\n", "&#xA;");
Marnix van Valen
  • 13,265
  • 4
  • 47
  • 74
LarsH
  • 27,481
  • 8
  • 94
  • 152
  • @Simon, would you take your car to the mechanic and just say it "does not work"? How about specifying what actually happened? and how it differs from what you expected. Otherwise you leave us to guess what the problem is. – LarsH Dec 18 '10 at 14:51
  • "<" is escaped "\r" and "\n" are not - as i explained in my edit in my original question. – Simon Ottenhaus Dec 18 '10 at 15:12
  • @Simon, apparently you want \r and \n to be escaped. You had not said so; you said you wanted to "Escape a string to be used in an XML-Attribute." Your edit says 'It does not work. XmlEscapeAttribute("Foo\r\nBar") will result in "Foo\r\nBar"' but that is correct behavior: the output is well-formed XML. The question then is, do you want "\r" and "\n" escaped only because you thought that was required for well-formed XML output? If so, you can use the InnerText / InnerText technique. Otherwise, you can change the last line to `return node.InnerXml.replace("\r", " ").replace("\n", " "); – LarsH Dec 20 '10 at 12:28
  • I want to escape a string to be used in an *anttribute*. You can't use \r\n in an attribute string. That is the fundamental difference to string values in nodes. This is why I asked this question. – Simon Ottenhaus Dec 20 '10 at 20:13
  • 1
    @Simon, why do you say you can't use \r\n in an attribute string? Please cite a reference. When I put `` (using a line break for \n) in Oxygen XML editor and ask it to Check Well-Formedness, it says the document is well-formed. – LarsH Dec 20 '10 at 20:47
  • 2
    @Simon, FYI, the spec (http://www.xml.com/axml/target.html#NT-AttValue) says the only characters you cannot use in an attribute value are `<`, a quote (whichever kind is being used as the attribute value delimiter), and `&` (unless the latter is used in an entity reference). No doubt that's why SecurityElement.Escape() and XmlAttribute are not escaping \n and \r. – LarsH Dec 20 '10 at 21:15
  • @Simon, did the above answer your question? – LarsH Dec 23 '10 at 17:18
  • 1
    @LarsH: I'm so sorry for all the trouble I caused you. Of course, you're right: `` is valid and gets parsed just fine. I simply assumed that it can't be valid XML because it just does not look right for me and the XmlWriter Class escapes \r and \n. Thank you for your patience and good answers! – Simon Ottenhaus Dec 24 '10 at 11:40
  • @Simon, no trouble ... I didn't know either, until looking for answers to this question, that newlines were allowed directly in attribute string values. Glad your question is resolved. – LarsH Dec 24 '10 at 15:51
0
public static string XmlEscapeAttribute(string unescaped)
{
    if (string.IsNullOrEmpty(unescaped))
        return unescaped;

    var attributeString = new XAttribute("n", unescaped).ToString();

    // Extract the string from the text like: n="text".
    return attributeString.Substring(3, attributeString.Length - 4);
}

This solution is similar to the one one proposed by @Mathias E. but it uses LINQ to XML rather than XmlDocument so should be faster.

The SecurityElement.Escape() solution has a couple of problems. First it doesn't encode new lines so that has to be done as an additional step. Also, it encodes apostrophes as &apos; which is not correct in an attribute value per the XML spec.

Inspiration for my solution came from this post.

Metalogic
  • 498
  • 6
  • 16
  • Why do you say "`'` which is not correct in an attribute value per the XML spec"? The `AttValue` production you linked to shows that it can include entity references, which are of the form `'&' Name ';'`. In fact, at https://www.xml.com/axml/target.html#syntax it explicitly says 'To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "'"...' – LarsH Apr 02 '20 at 12:06
  • It's also unclear what you mean by "`SecurityElement.Escape()`... doesn't encode new lines". It does encode them, as literal newline characters. Maybe you mean it doesn't encode them with a character reference? If this is a problem in some cases, it would be helpful to clarify when/why it's a problem, because it's not a problem in general. – LarsH Apr 02 '20 at 12:15
-3

if it can be of any help, in several language, one uses createCDATASection to avoid all XML special characters.

It adds something like this :

<tag><![CDATA[ <somecontent/> ]]></tag>
Mathias E.
  • 471
  • 3
  • 5
  • 3
    @Matthias, the OP asked about how to escape a string to be used *in an attribute*. Can you put a CDATA section in an attribute value? – LarsH Dec 16 '10 at 19:21
  • 1
    I should read the question before replying... CDATA can't be used in an attribute value. – Mathias E. Dec 17 '10 at 13:53