We are having a really weird issue reading Xml files from Blob Storage.
We are storing Xml files in Blob storage. When we store the files there, they seem fine. When we download them any other way, they still seem fine.
BUT when we use this code to download and serialize them:
private Contextualizable DownloadData(CloudBlobContainer blobStorage, string filetoDownload) {
return (Contextualizable)new XmlSerializer(documentTypeByFileName[filetoDownload], new XmlRootAttribute("Document"))
.Deserialize(new StringReader(this.DownloadFromBlobStorage(blobStorage, filetoDownload)));
}
private string DownloadFromBlobStorage(CloudBlobContainer blobStorage, string filetoDownload) {
return blobStorage
.GetBlockBlobReference(filetoDownload)
.DownloadTextAsync()
.GetAwaiter()
.GetResult()
}
This will fail because a mysterious "?" somehow gets inserted BEFORE the prolog.
We have a naive fix/hack/workaround for this. We've added .Remove(0,1)
to the end of DownloadFromBlobStorage
as follows:
private string DownloadFromBlobStorage(CloudBlobContainer blobStorage, string filetoDownload) {
return blobStorage
.GetBlockBlobReference(filetoDownload)
.DownloadTextAsync()
.GetAwaiter()
.GetResult()
.Remove(0, 1); // We don't know why!?!?!
}
This seems to work. But it seems a bit hackish since we don't know where the initial question mark is coming from and what other anomolies there might be in our data or data processing, which could later result in data corruption or data loss.
We suspect there might be an encoding problem, but I couldn't find any trivial solutions towards specifying the encoding of the document we are downloading and parsing.
Any ideas where the character might be coming from and what the most reliable way to fix this would be?