0

We are having a really weird issue reading Xml files from Blob Storage.

We are storing Xml files in Blob storage. When we store the files there, they seem fine. When we download them any other way, they still seem fine.

BUT when we use this code to download and serialize them:

        private Contextualizable DownloadData(CloudBlobContainer blobStorage, string filetoDownload) {
            return (Contextualizable)new XmlSerializer(documentTypeByFileName[filetoDownload], new XmlRootAttribute("Document"))
                 .Deserialize(new StringReader(this.DownloadFromBlobStorage(blobStorage, filetoDownload)));
        }

        private string DownloadFromBlobStorage(CloudBlobContainer blobStorage, string filetoDownload) {
            return blobStorage
                .GetBlockBlobReference(filetoDownload)
                .DownloadTextAsync()
                .GetAwaiter()
                .GetResult()
        }

This will fail because a mysterious "?" somehow gets inserted BEFORE the prolog.

We have a naive fix/hack/workaround for this. We've added .Remove(0,1) to the end of DownloadFromBlobStorage as follows:

        private string DownloadFromBlobStorage(CloudBlobContainer blobStorage, string filetoDownload) {
            return blobStorage
                .GetBlockBlobReference(filetoDownload)
                .DownloadTextAsync()
                .GetAwaiter()
                .GetResult()
                .Remove(0, 1);  // We don't know why!?!?!
        }

This seems to work. But it seems a bit hackish since we don't know where the initial question mark is coming from and what other anomolies there might be in our data or data processing, which could later result in data corruption or data loss.

We suspect there might be an encoding problem, but I couldn't find any trivial solutions towards specifying the encoding of the document we are downloading and parsing.

Any ideas where the character might be coming from and what the most reliable way to fix this would be?

Brian Kessler
  • 2,187
  • 6
  • 28
  • 58
  • 1
    This could be a byte order mark: https://en.wikipedia.org/wiki/Byte_order_mark. What is the value of this byte? – Polyfun May 12 '20 at 15:48
  • As I mention in the question, so far as we can tell, our solution seems to be trying to tell us it is a question mark. What would be the best way to check the value of the byte? – Brian Kessler May 12 '20 at 19:18
  • @Blindy, if you want to call this a duplicate, ti would be more accurate to point to this unanswered question: https://stackoverflow.com/questions/25298355/xmlexception-while-deserializing-xml-file-in-utf-16-encoding-format – Brian Kessler May 15 '20 at 09:51

0 Answers0