How to remove the namespace and use UTF-8 No BOM encoding on an XML

Question

I need to create a file without namespace and using UTF-8 No BOM to allow a WMS to read the file. (I needed to add the namespace for the mapping because the target schema isn't unique)

I created a custom send pipeline that assembles the XML and then removes the namespace (Using the ESB Remove Namespace component)

I have set it up in a way that i would assume the BOM should be removed but when i check the outbound file, it has changed to an ANSI file (but i specifically say the encoding is UTF-8 in the pipeline component)

Am i doing something wrong? is there a better alternative to this?

score 3 · Accepted Answer · 2015-11-06T15:42:00.730

3

The pipeline component is probably working fine and already doing its job of removing the BOM and encoding to UTF-8.

Your second screenshot shows Notepad++. The "Encode in" feature of Notepad++ lets you display the content of your file in a particular encoding.

However it is not an Encoding detector. Detecting an Encoding can be a difficult task, especially when the file has no BOM because some encoding have similarities (example: UTF-8's 128 first characters are the same as ASCII).

Does your input file actually contain any character encoded in a way unique to UTF-8? That could be a good test.

edited Nov 06 '15 at 15:42

answered Nov 06 '15 at 15:28

Also try setting the TargetCharSet in the Assembler (Note: You have to do this in Visual Studio and deploy rather than in the BizTalk Administration console due to a bug known since BizTalk 2006 https://support.microsoft.com/en-us/kb/939550) – Dijkgraaf Nov 08 '15 at 19:43
Sorry for the slow reply. if i use the output file i generate, the system sees it as a "x" seperated file. If i use notepads "Convert to UTF-8" function and rerun the file, the system sees the file as an "xml file" and runs it perfectly. Notepad++ also moves the encoding in check from "UTF-8 - BOM" to "UTF-8". (I'm not sure if thats relevant?) – Andy Nov 10 '15 at 08:45
@Andy By "system" you mean whatever system is consuming you output? The manipulation you just described with Notepad++ seems to only add the BOM. If it works well that way, why did you originally want to remove the BOM? – Nov 10 '15 at 13:04
@GaryGardet "system" is indeed the system that reads the xml i create. This system cannot deal with BOMs so it will reject files that have them and thats why i want to remove the BOM. The manipulation i just described was removing the BOM. – Andy Nov 10 '15 at 14:40
However, my notepad and the notepad on the cloud have a diffrent way of wording things. The cloud one is like the screenshot and uses UTF-8/UTF-8 without BOM. The one locally uses UTF-8/UTF-8-BOM. I have never noticed this before. – Andy Nov 10 '15 at 14:41
@Andy Oh right, your manipulation removes the BOM. If you are still set up like on your first screenshot, I don't see why the BOM would not be removed. The "Preserve BOM" setting set to false in the XML Assembler should be enough to make sure you don't have a BOM, and in my experience it works. So either the ESB Remove Namespace component does something weird, or you have something misconfigured. Maybe start testing without the ESB component and experiment from there. – Nov 11 '15 at 16:21
1

Solved this problem by removing the need for the "remove name space" component. I can only assume you are right and that the custom component breaks the setting for encoding (both on itself and the preceding components, but that is likely because it just always sets it to utf8 with BOM) – Andy Nov 18 '15 at 09:51

How to remove the namespace and use UTF-8 No BOM encoding on an XML

1 Answers1