2

I am trying to convert binary data (an Excel XLS file) into data I can pass as part of a HTTP multi-part request (the code to generate the multi-part requests has been working with string data for some time so I think it is the encoding of the binary data that is the issue not how the request is formed).

For ref the first seven chars of the Excel file (when view in Notepad++) are:

    decimal hex
Ð   208     D0  
Ï   207     CF  
   17      11  
à   224     E0  
¡   161     A1  
±   177     B1  
    26      1A  

I'm setting element.BinaryContent using

binaryContent = System.IO.File.ReadAllBytes(filePath);

If I then use

content.Append(Encoding.UTF8.GetString(element.BinaryContent));

to create the HTTP request content this gives (from the Immediate Window in VS):

binary content converted to UTF8

In the uploaded file the control chars and the English characters are retained correctly but other characters are converted to incorrect values.

If I'm not explaining this well, the below image shows the data as uploaded on the left and the original on the right.

Comparison of files: before and after upload

For ref, how I make the request is using this:

    protected static void SetRequestContent(string requestContent, HttpWebRequest request, string contentType)
    {
        request.ContentType = contentType;

        byte[] byteData = UTF8Encoding.UTF8.GetBytes(requestContent);

        using (Stream postStream = request.GetRequestStream())
        {
            postStream.Write(byteData, 0, byteData.Length);
        }
    }

where requestContent has correctly-formed multi-part content, like this:

--------------------------BNDY
Content-Disposition: form-data; name=password
Txxxxx
--------------------------BNDY
Content-Disposition: form-data; name=username
a@b.com
--------------------------BNDY
Content-Disposition: form-data;name=\"FILE\";filename=\"c:/POSTOutput/Upload.xls\"
��\u0011\u0871\u001a�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0>\0\u0003\0��\t\0\u0006\0\0\0\0\0\0\0\0\0\0\0\u0001\0\0\0\u0001\0\0\0\0\0\0\0\0\u0010\0\0\u0002\0\0\0\u0001\0\0\0����\0\0\0\0\0\0\0\0������������ (etc). 

What do I need to do to pass through the data in it original format?

Badgerspot
  • 2,301
  • 3
  • 28
  • 42

1 Answers1

0

Your code line

content.Append(Encoding.UTF8.GetString(element.BinaryContent));

assumes that arbitrary byte sequences (from a binary file with no inherent character encoding) can always be converted to a Unicode string. This is not the case: documentation - undefined codepoints may result in substitutions (fallback) or even cause an ArgumentException.

If you must have a string, use a robust encoding like base64 instead:

content.Append(Convert.ToBase64String(element.BianryContent));

Even better: specify the MIME type using the type attribute of the subpart (application/vnd.ms-excel for xls, different for xlsx), so you do not need to encode at all, and submit the raw byte sequence directly to the stream.

References:
- Using HttpWebRequest to POST data/upload image using multipart/form-data

Community
  • 1
  • 1
Cee McSharpface
  • 8,493
  • 3
  • 36
  • 77