Double-sized File After Upload

Question

I am managing a web application that accepts a file from a user and afterwards uploads it to a file location in the server. In order to cater files of big sizes (100 MB above), I decided to use the blob.slice method. My problem is that after the file is uploaded and I try to download it, the file size is twice its original size, thus, causing it to be corrupted. I will be showing the flow of data from the client side up to the server side to show the step-by-step actions of the upload method.

The directive is where the HTML for the input type="file" and the logic of the blob slicing is located .

CLIENT SIDE

//Directive
var template = [
    '<div class="file-input">',

        '<div>',
            '<input type="text" ng:model="fileinfo.meta.name"  disabled />',
        '<div class="filebrowse">',
            '<button type="button" class="browsemodal">Browse</button>',
               '<input type="file"></input>',
                '</div>',
        '</div>',
    '</div>'
].join('');

module.exports.init = function (app) {

    app.directive('fileInput', [
        function () {
            return {
                restrict: 'E',
                template: template,
                replace: true,
                scope: {
                    fileinfo : '=ngModel'
                },
                link: function (scope, element) {                    

                    element.bind('change', function (ev) {
                        var fileSize = ev.target.files[0].size;
                        var chunkSize = 64 * 1024;
                        var offset = 0;
                        var self = this;
                        var chunkReaderBlock = null;

                        var readEventHandler = function (evt) {
                            offset += evt.target.result.length;
                            scope.fileinfo.meta = ev.target.files[0];
                            scope.fileinfo.data = ev.target.files[0];
                            scope.fileinfo.sampleData.push(evt.target.result);

                            if (offset >= fileSize) {
                                return;
                            }

                            chunkReaderBlock(offset, chunkSize, ev.target.files[0]);
                        };

                        chunkReaderBlock = function (_offset, length, _file) {
                            var reader = new FileReader();
                            var blob = _file.slice(_offset, length + _offset);

                            reader.onload = readEventHandler;
                            reader.readAsText(blob);
                        };

                        chunkReaderBlock(offset, chunkSize, ev.target.files[0]);
                    });
                }
            }
        }
    ]);
};

scope.fileinfo represents a property called documentInfoModel in the factory as you can see in the snippet below.

//Factory    
documentInfoModel: function () {
    var self = this;
    self.meta = null;
    self.data = null;
    self.sampleData = [];
    return self;

Now, as soon as I click the Upload button, it will trigger a function named saveData in the controller. This function will call an http.Post to the API from the server side through the documentService.upsertDocument method. The API is named AddFile. See full details below.

//Controller
$scope.saveData = function () {
    documentService.upsertDocument($scope.fileInfoItem).then(function (data) {
        //File was uploaded successfully
    };
};

SERVER SIDE

    public HttpResponseMessage AddFile(HttpRequestMessage request, [FromBody] DocumentInfoModel file)
                {
                    using (var transaction = new TransactionScope(TransactionScopeOption.Required, new TimeSpan(0, 30, 0)))
                    {
                        try
                        {
                            StringBuilder sb = new StringBuilder();
                            foreach (string text in file.sampleData)
                                sb.Append(text);

                            byte[] data = Encoding.Unicode.GetBytes(sb.ToString());
                            var fileLocation = "C:\Temp\";
                            var targetFileName = file.data;

                            if (!Directory.Exists(fileLocation))
                                Directory.CreateDirectory(fileLocation);

                            File.WriteAllBytes(targetFileName, data);
                        }

                        catch()
                        {}
    return request.CreateResponse(HttpStatusCode.OK);
}

Can anyone help me identify anything that is wrong with the code? I will be putting here as well the download API if it helps. Thanks a lot!

private HttpResponseMessage Download(string fileName)
        {
            var filePath = "C:\Temp\";

            var res = new HttpResponseMessage();

            if (!string.IsNullOrEmpty(filePath) && File.Exists(filePath))
            {
                res.Content = new StreamContent(File.OpenRead(filePath));
                res.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
                res.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
                {
                    FileName = fileName
                };
                res.StatusCode = HttpStatusCode.OK;
            }
            else
                res.StatusCode = HttpStatusCode.InternalServerError;

            return res;
        }

The code is probably converting binary data to utf-8. That would tend to double the size. — georgeawg, Dec 10 '16 at 06:14
@Jaromanda X: No, the files are of different formats (.pdf, etc.). Can you confirm if readAsText only accepts text/plain file types? — Josh Monreal, Dec 10 '16 at 08:26
if it's binary data why are you using **String**Builder and **Encoding.Unicode** - you want binaries, then use binary methods only — Jaromanda X, Dec 10 '16 at 09:33

score 1 · Accepted Answer · answered Feb 08 '17 at 13:46

Upon seeking help from my colleague, we were able to find a resolution to my problem. Maybe I just don't know how to implement properly the asynchronous methods of FileReader when uploading large files so we decided to use a different approach. The first thing that we did was to remove the template inside the directive and modify the directive to something like the one below:

//Directive
app.directive('fileInput', [
        function () {
            return {
                restrict: 'EA',
                replace: true,
                scope: {
                    fileinfo: '=ngModel'
                },
                link: function (scope, element) {                    
                    element.bind('change', function (ev) {
                        scope.$apply(function () {
                            var val = element[0].files[0];
                            scope.fileinfo.fileName = ev.target.files[0];
                            scope.fileinfo.file = val;
                        });
                    });
                }
            }
        }
    ]);

Then we created the template inside the HTML file itself (see below):

<input type="text" ng:model="fileInfoItem.fileName" disabled />
<div class="filebrowse">
    <button type="button" class="browsemodal">Browse</button>
    <input name="file" file-input="fileinfo" ng-model="fileInfoItem" type="file" />
</div>

Next, in the controller we used FormData to store the file and afterwards we sent it to the API.

//Controller
$scope.saveDocument = function () {
    var fd = new FormData();
    fd.append('file', $scope.fileInfoItem.file);
    documentService.upsertDocument($scope.fileInfoItem, fd)
    .then(function (data) { 
        //Upload was successful.
    };
};

//Services
upsertDocument: function (fileInfoItem, data) {
    console.log(data);
    var payload = {
        FileName: fileInfoItem.fileName
    };
    return apiCall = $http.post(API_ENDPOINT.upsertDocument(fileInfoItem.docId), payload {})
        .then(function (ret) {
            return $http.post(API_ENDPOINT.upsertDocumentFile(ret.data), data, {
                withCredentials: false,
                headers: {
                'Content-Type': undefined
                },
                transformRequest: angular.identity
            });
        });
    },

The reason why we created two APIs is because we could not pass both the file and the object payload in the post body to a single API. This might not have been the best solution but it definitely worked for our application.

score 0 · Answer 2 · edited May 23 '17 at 12:17

The bad

When you call reader.readAsText(blob); on a binary file you will be in the risk of not getting the same data... Especially when it has to do with binaries

Take this example where i created a blob (text file with "testing 1 2 3") in a utf-16 format

Hint the resulting buffer will not be the same...

var buffer = new Uint8Array([
  255, 254, 84, 0, 101, 0, 115, 0, 116, 0, 105, 0, 110,
  0, 103, 0, 32, 0, 49, 0, 32, 0, 50, 0, 32, 0, 51, 0
]) // "Testing 1 2 3" buffer in UTF-16

var blob = new Blob([buffer])
var fr = new FileReader
fr.onload = () => {
  console.log(fr.result)
  let buffer = strToUint8(fr.result)
  document.body.innerHTML += '<br>result: ' + Array.from(buffer)
}
fr.readAsText(blob)

function strToUint8(str) {
  let buf = new ArrayBuffer(str.length*2) // 2 bytes for each char
  let bufView = new Uint16Array(buf)
  
  for (let i = 0, strLen = str.length; i < strLen; i++)
    bufView[i] = str.charCodeAt(i)
  
  return new Uint8Array(buf)
}

actual: 255,254,84,0,101,0,115,0,116,0,105,0,110,0,103,0,32,0,49,0,32,0,50,0,32,0,51,0

More on this can be read here: HTML5 File API read as text and binary

The ugly

The ugly part is that you are trying to read the content of each file with the FileReader while in fact you don't need to. It will only take more time to read the content and it will also take up more cpu and memory

The good

You just have to slice the blob into the size you want without having to read any data and then upload each chunk as binary (not as text)

var blob = new Blob(['...........'])
var chunks = []

const BYTES_PER_CHUNK = 2; // 2 byte chunk sizes.
const SIZE = blob.size;

var start = 0;
var end = BYTES_PER_CHUNK;

while(start < SIZE) {
  chunks.push(blob.slice(start, end));

  start = end;
  end = start + BYTES_PER_CHUNK;
}

// Uploads chunks one at the time
async function upload(chunks) {
  for (let chunk of chunks) {
    await fetch('/upload', {method: 'post', body: blob})
  }
}

upload(chunks)

That is a strange example. The input is an UTF-16LE [Byte Order Mark](https://en.wikipedia.org/wiki/Byte_order_mark) followed by "Testing 1 2 3" in UTF-16LE. The output is "Testing 1 2 3" (in UTF-16LE) stripped of the Byte Order Mark. I am not sure it illustrates the point. — georgeawg, Dec 10 '16 at 14:41

Double-sized File After Upload

2 Answers2

The bad

The ugly

The good