0

The code below serializes XML into a string, then writes it to an XML file (yes quite a bit going on with respect to UTF8 and removal of the Namespace):

var bidsXml = string.Empty;

var emptyNamespaces = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });

var settings = new XmlWriterSettings();
settings.Indent = true;
settings.OmitXmlDeclaration = true;

activity = $"Serialize Class INFO to XML to string";
using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8))
{
  XmlSerializer xml = new XmlSerializer(info.GetType());

  xml.Serialize(writer, info, emptyNamespaces);

  bidsXml = Encoding.UTF8.GetString(stream.ToArray());
}

var lastChar = bidsXml.Substring(bidsXml.Length);

var fileName = $"CostOffer_Testing_{DateTime.Now:yyyy.MM.dd_HH.mm.ss}.xml";

var path = $"c:\\temp\\pjm\\{fileName}";
File.WriteAllText(path, bidsXml);

Problem is, serialization to XML seems to introduce a CR/LF (NewLine):

enter image description here

It's easier to see in the XML file:

enter image description here

A workaround is to strip out the "last" character:

bidsXml = bidsXml.Substring(0,bidsXml.Length - 1);

But better is to understand the root cause and resolve without a workaround - any idea why this a NewLine characters is being appended to the XML string?

** EDIT **

I was able to attempt a load into the consumer application (prior to this attempt I used an API to import the XML), and I received a more telling message:

The file you are loading is a binary file, the contents can not be displayed here.

So i suspect an unprintable characters is somehow getting embedded into the file/XML. When I open the file in Notepad++, I see the following (UFF-8-Byte Order Mark) - at least I have something to go on:

enter image description here

Bill Roberts
  • 1,127
  • 18
  • 30
  • What's the problem? Whitespace is usually not significant in XML. – Klaus Gütter Jun 23 '21 at 19:02
  • The feedback I receive from the consumer of my XML amounts to "your XML is wrong" - I don't get feedback with regards to why it's wrong. So I'm working towards making me XML as clean as I can to eliminate an best i can, any issues on my end. – Bill Roberts Jun 23 '21 at 19:13
  • "your XML is wrong" could mean: 1. It is not valid XML at all, e.g. a missing closing tag or an illegal character somewhere 2. if you have an XML schema, the XML might fail to validate against this schema 3. some implicit semantic is expected but not provided. – Klaus Gütter Jun 23 '21 at 19:17
  • that's true - thanks for the reminder. however i don't get XML/XSD validation errors. Instead the consuming applications says "General Error. Error: SQL statement to execute cannot be empty or null", and when I reach out to support, my last response was a "corrected" XML file, with no details on why my autogenerated XML failed. So again, I'm working towards resolution of any seemingly minor or trivial issues that may occur on my end – Bill Roberts Jun 23 '21 at 19:30
  • "SQL statement"?? What does this have to do with XML? – Klaus Gütter Jun 23 '21 at 19:32
  • exactly......... – Bill Roberts Jun 23 '21 at 19:32
  • 1
    Leaving aside whether terminating the XML with a trailing newline is correct, I can't reproduce the behavior you are seeing. When I copy your code info a fiddle, the last character is a '>', unicode value = 003E. See https://dotnetfiddle.net/KAezsf. – dbc Jun 23 '21 at 20:03
  • Please see my comments added to the post regards: The file you are loading is a binary file, the contents can not be displayed here. – Bill Roberts Jun 23 '21 at 20:09
  • In Notepad++ i changed the encoding to UTF-8, saved the file, and the consumer application accepted it – Bill Roberts Jun 23 '21 at 20:39
  • 1
    If you don't want a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) at the beginning of the file use `new UTF8Encoding(false)`. See: [Force no BOM when saving XML](https://stackoverflow.com/q/24185094/3744182). Or I believe if you use [`Xmlserializer.Serialize(Stream, Object, XmlSerializerNamespaces)`](https://learn.microsoft.com/en-us/dotnet/api/system.xml.serialization.xmlserializer.serialize?view=net-5.0#System_Xml_Serialization_XmlSerializer_Serialize_System_IO_Stream_System_Object_System_Xml_Serialization_XmlSerializerNamespaces_) a BOM is not included. – dbc Jun 23 '21 at 20:54
  • 1
    By the way, `var lastChar = bidsXml.Substring(bidsXml.Length);` is not the correct way to get the last character in a string. c# strings are zero-indexed so the last character is given by **`bidsXml[bidsXml.Length - 1]`**, and a string containing the last character is **`bidsXml.Substring(bidsXml.Length-1)`**. `bidsXml.Substring(bidsXml.Length)` is just an empty zero-length string. – dbc Jun 23 '21 at 20:59

1 Answers1

0

So it seems the consumer of my XML does not want BOM (Byte Order Mark) within the stream.

Visiting this site UTF-8 BOM adventures in C#

I've updated my code to use new UTF8Encoding(false)) rather than Encoding.UTF8:

var utf8NoBOM = new UTF8Encoding(false);

var bidsXml = string.Empty;

var emptyNamespaces = new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty });

var settings = new XmlWriterSettings();
settings.Indent = true;
settings.OmitXmlDeclaration = true;

activity = $"Serialize Class INFO to XML to string";
using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, utf8NoBOM))
{
  XmlSerializer xml = new XmlSerializer(info.GetType());

  xml.Serialize(writer, info, emptyNamespaces);

  bidsXml = utf8NoBOM.GetString(stream.ToArray());
}


var fileName = $"CostOffer_Testing_{DateTime.Now:yyyy.MM.dd_HH.mm.ss}.xml";

var path = $"c:\\temp\\pjm\\{fileName}";
File.WriteAllText(path, bidsXml, utf8NoBOM);
Bill Roberts
  • 1,127
  • 18
  • 30
  • And.... that fixes the problem of the new line character? If not, than this isn't really an answer to the question, of which the BOM seemed to be coincidental. – Heretic Monkey Jun 23 '21 at 21:31
  • @HereticMonkey - it's not clear there was a newline to begin with, that hasn't been reproduced. – dbc Jun 23 '21 at 21:33
  • 1
    @dbc Sounds like the question should be closed as non-reproducible then... – Heretic Monkey Jun 23 '21 at 21:34