2

Consider the following code:

private XmlDocument CreateMessage(string dirtyInput)
    {
        XmlDocument xd = new XmlDocument();
        string str = @"<Message><Request>%REQ%</Request><Message>";        
        str = str.Replace("%REQ%", dirtyInput);
        xd.LoadXml(str);
        return xd;
    }

What steps should I take to sanitize/validate this dirtyInput string (it can come from untrusted sources)?

EDIT:

To provide a bit more context, this XML "message" is then being sent (by me) to a third party web service. I am mostly concerned with the mitigating the risk that someone could pass me a string that could possibly exploit vulnerabilities in my XML parser, or perhaps even in the parser on the target [third party] end (to whom I am sending this message). So clearly I could focus on special XML characters like < > & etc. -- do I also need to worry about escaped/encoded forms of those characters? Is the SecurityElement.Escape method mentioned in the possible dupe link adequate for this?

mikey
  • 5,090
  • 3
  • 24
  • 27
  • Check this possible duplicate of your question: [link](http://stackoverflow.com/questions/8331119/escape-invalid-xml-characters-in-c-sharp). – Re Captcha Feb 27 '14 at 13:18
  • 1
    Thanks - Added an edit section to clarify. I did see that dupe question, but was still uncertain. – mikey Feb 27 '14 at 13:33

2 Answers2

5

Since you're generating an XmlDocument, you could rely on the DOM methods to handle all escaping for you:

private XmlDocument CreateMessage(string dirtyInput)
{
    XmlDocument xd = new XmlDocument();
    xd.LoadXml(@"<Message><Request></Request></Message>");
    xd["Message"]["Request"].InnerText = dirtyInput;

    return xd;
}
Phylogenesis
  • 7,775
  • 19
  • 27
0

Depends on what environment this string is going to be applied to (Web? Database?...)

If it is the web and you're trying to prevent XSS, this will do the trick:

 HttpUtility.HtmlEncode(dirtyInput);

For databases, I'd forego sanitization in favour of paramterized queries.

As mentioned in the comments, you should wrap the dirtyinput in a Character Data section:

 <![CDATA[
   ...
 ]]>
Mister Epic
  • 16,295
  • 13
  • 76
  • 147
  • This is close, but not entirely accurate. Lots of HTML entities are invalid in XML. If you're doing this, you should specify that the data is raw character data by surrounding it with `<![CDATA[...]]>`. – Phylogenesis Feb 27 '14 at 13:24
  • Thanks Chris, I do use HtmlEncode often for XSS -- this "message" is not going out as a web response, it is actually being sent as a web service request to a third party. My concern is that an attacker could perhaps provide input that would throw the XML parser off. I may go with something like figure 6 in the following link http://msdn.microsoft.com/en-us/magazine/ee335713.aspx along with validating my input for white-listed characters and a specific length. – mikey Feb 27 '14 at 13:39
  • 1
    CDATA sections don't help at all. You still blow up if the content has `]]>` in it. And the way around that (multiple CDATA Sections) is more confusing than just doing the proper XML escape in the first place. – bobince Feb 27 '14 at 15:27