Removing and to get valid XML?

Question

I have a WCF service (.NET C#) that sometimes returns for example  and  which is not correct XML.

I guess I could build a translator that are applied on each string field before sending response but it feels a bit sketchy, I do not know what to look for(more then the above) or what to translate it into. Maybe there is a existing solution for this?

Does this help: https://stackoverflow.com/questions/44765194/how-to-parse-invalid-bad-not-well-formed-xml If you are aware of certain elements/tags that contain these literals, you could wrap them in CData tags — Gokul Panigrahi, Nov 09 '22 at 09:32
It depends on what XML will be used for can be replaced by \n or — Viliam, Nov 09 '22 at 09:35
XML has a limited number of character that get escaped starting with ampersand. When the Xml is embedded inside an HTTP request/response additional character get escaped. You are using WCF which is HTTP. The XML is CORRECT. You need to use System.Net.WebUtility.HtmlDecode(string) to resolve issue. See following for more info : https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references?force_isolation=true — jdweng, Nov 09 '22 at 10:01
@jdweng, in this case the receiving system can't handle the response and I don't think they can use HTMLDecode(Biztalk). — Banshee, Nov 09 '22 at 10:29
I is c# code. If not c# than the html parser should automatically remove the encoding unless they are trying to use a kludge like parse an html with Regex. Sound like the developers are using the wrong tools. — jdweng, Nov 09 '22 at 10:43
I have tried using Postman to get to see the response. Postman tags the response as XML but it still contains so no HTMLDecoding is done. If I however turn to preview tab it changes the to proper chars so it seems like you might be right. If so, the receiving system is not decoding the HTML properly. — Banshee, Nov 09 '22 at 11:02

score 0 · Answer 1 · answered Nov 09 '22 at 09:43

0

&#xD stands for a new line (the way i know). The behavoir is the same as Environment.NewLine. So you can replace it easily:

string text = yourString.ToString().Replace("&#xD", Environment.NewLine).

Dont know if this is what you're searchin for, but thats the only thing thats in my mind right now.

Hope it helps. :)

answered Nov 09 '22 at 09:43

Maurice Preiß

13
4

Yes, that is one solution but I guess that there might be more special characters like this that needs to be handled and I cannot update the services vary often. – Banshee Nov 09 '22 at 10:09
So as you wrote, you have to search the meaning of all special charackters and write a converter. – Maurice Preiß Nov 10 '22 at 12:02

score 0 · Accepted Answer · answered Nov 09 '22 at 11:39

0

These characters are allowed in XML 1.1 but not in XML 1.0. XML 1.1 has not been a great success and Microsoft has never supported it.

Does the XML declaration at the start of the file say version="1.1"?

A clean way to handle this would be to process the file using a parser that does support XML 1.1, converting it to XML 1.0 in the process. For example, you could do this with a simple Java SAX application, or XSLT if you prefer.

Quite what you want to translate these characters into is largely up to you. It depends whether they have any significance. If you want to translate them losslessly into XML 1.0, you could convert them to processing instructions such as <?char x1E?>.

answered Nov 09 '22 at 11:39

Michael Kay

156,231
11
92
164

It starts with something like this : , the header says text/xml;charset=utf-8, Microsoft-HTTPAPI/2.0 so no version is stated. – Banshee Nov 09 '22 at 11:50
If there's no XML declaration then you may have to add one before parsing, or you may be able to tell the parser what version of XML to assume via some API, or it may just accept XML 1.1 by default. – Michael Kay Nov 09 '22 at 12:05

Removing and  to get valid XML?

2 Answers2

Removing and to get valid XML?