0

Background

If I read the W3 spec https://www.w3.org/TR/REC-xml/#sec-line-ends correctly, it is not valid to return CR LF (\r\n) or just CR inside an XML element value (i.e. to represent a newline). It must be just LF.

There are some references to the DataContractSerializer "handling this automatically", and I can see that what it does is a replacement as follows: \r\n => 
\n (i.e. the CR has been escaped to its character entity, leaving only the LF).

However, if my WCF service response (WCF uses DataContractSerializer) serialises a string containing CRLF to 
\n as above, some (non Windows) client applications (not in my control) fail to deserialise the XML because "it is not valid XML" (their claim). They claim that \n is valid but not 
\n.

Whether or not it really is valid XML is not the issue - the clients cannot be recoded, but the service can, so I need to do something on the service side to convert \r\n => \n.

My issue - if I manually remove CR, WCF service response seems to put it back in!

So in the class that is serialised to make the response to a WCF method call, I did a simple replace within strings, as follows using an extension method:

[DataMember(IsRequired = true, Order = 3)]
public string Address
{
    get { return Data.Address.MakeXmlSafe(); }
    internal set { }
}

public static string MakeXmlSafe(this string source)
{
    if (string.IsNullOrEmpty(source))
        return source;

    // In XML it is not valid to return CR LF (\r\n) or just CR (\r) as a newline. It must be just LF (\n).
    return source.Replace("\r\n", "\n").Replace("\r", "\n");
}

(FYI, the service is .NET Framework 4.6)

Debugging the service, I can see the replacement being made correctly. HOWEVER, if I check the response in a tool such as SOAPUI, or WCF Service Trace Viewer Tool (SvcTraceViewer.exe) the newlines are represented as CR LF (\r\n).

For sanity, I also tested that manually serialising with DataContractSerializer preserves the LF with no CR, and it does: Locals

I don't know if the tools (being Windows applications) are automatically replacing LF with CR LF, in order to render it. Or is WCF doing a replacement after DataContractSerializer? If so, what can I do to guarantee the conversion \r\n => \n (and not \r\n => 
\n)?

Laurence
  • 980
  • 12
  • 31
  • 1
    What do you mean by "it is not valid to return CR LF"? Return from what? The spec makes it clear that CR LF is a valid line ending within an XML document - it's just that the XML processor must behave as if it normalizes these to LF. – Jon Skeet Nov 08 '21 at 11:28
  • White spaces in XML are ignored between tags. Inside tags that are only 5 invalid characters 1) Double Quote 2) Ampersand 3) Single Quote 4) Less that sign (open bracket) 5) Greater Than (closing bracket). Any other restrictions are base on the schema and can be different depending on how the XML is used. – jdweng Nov 08 '21 at 12:29
  • The character(s) used for a newline aren't controlled by `DataContractSerializer`, they are controlled by `XmlWriter`, specifically [`XmlWriterSettings.NewLineChars`](https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmlwritersettings.newlinechars?view=net-5.0). See [How can I generate XML with CR, instead of CRLF in XmlTextWriter](https://stackoverflow.com/q/3415294/3744182). If you want to use LF instead of CRLF you will need to create an `XmlWriter` yourself and serialize using it. – dbc Nov 08 '21 at 15:28
  • To control `XmlWriterSetting` for a WCF service, I *think* [WCF and XmlSerialization and XmlWriterSettings](https://stackoverflow.com/q/7616060/3744182) is what you need, it points to [Custom Message Encoder: Custom Text Encoder](https://learn.microsoft.com/en-us/dotnet/framework/wcf/samples/custom-message-encoder-custom-text-encoder). But I've never tried it myself. – dbc Nov 08 '21 at 15:43
  • @JonSkeet - whether it is valid or not (you are probably right) is actually redundant for me. The clients consuming XML data from my service fail deserialising the data, and claim that if \r\n was replaced with \n that would work. How to do that is what my question is really about. – Laurence Nov 08 '21 at 16:05
  • @dbc thanks, this looks promising .... will look into it. – Laurence Nov 08 '21 at 16:16
  • "whether it is valid or not (you are probably right) is actually redundant for me" - I don't think it should be. Even if you have to work round it for now, if you're trying to work with broken peers, you should make sure the owners accept that the code *is* broken and commit to fixing it in the future. That way all the *other* users of the same client in the future don't end up with the same problem. That said, it's odd to see ` \n` instead of `\r\n`. It's possible that a *normal* CRLF is fine, and it's just a half-encoded one that is causing problems. – Jon Skeet Nov 08 '21 at 16:28

0 Answers0