0

I'm trying to split large files (3gb+) into chunks of 100mb, then sending those chunks through HTTP. For testing, i'm working on a 29 mb file, size: 30380892, size on disk: 30384128 (so there is no use of a 100mb limit condition at the moment).

This is my code:

List<byte[]> bufferList = new List<byte[]>();
byte[] buffer = new byte[4096];
FileInfo fileInfo = new FileInfo(file);
long length = fileInfo.Length;
int nameCount = 0;
long sum = 0;
long count = 0;

using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{    
    while (count < length)
    {
        sum = fs.Read(buffer, 0, buffer.Length);
        count += sum;

        bufferList.Add(buffer);
    }

    var output2 = new byte[bufferList.Sum(arr => arr.Length)];
    int writeIdx2 = 0;
    foreach (var byteArr in bufferList)
    {
        byteArr.CopyTo(output2, writeIdx2);
        writeIdx2 += byteArr.Length;
    }

    HttpUploadBytes(url, output2, ++nameCount + fileName, contentType, path);
}

In this testing code, i'm adding each buffer I read into a list, when finished reading i'm combining the buffer array into one complete array. The problem is, the result I get (output2 size) is 30384128 (as size on disk), so the file that get received in the server is corrupted.

What am I doing wrong?

Camilo Terevinto
  • 31,141
  • 6
  • 88
  • 120
Aa Yy
  • 1,702
  • 5
  • 19
  • 34
  • 2
    You should basically add a `byte[] currentBuffer = new byte[4096]` *inside* the `while` loop and use `fs.Read(currentBuffer, 0, currentBuffer.Length)` instead – Camilo Terevinto Jun 10 '18 at 14:09
  • 1
    I'm also confused how this is even generating chunks (in any _meaningful_ sense). It is ultimately uploading the **entirety** of `output2` (which isn't a chunk). In that case, the vast majority of the code could be replaced with a call to `File.ReadAllBytes` - https://msdn.microsoft.com/en-us/library/system.io.file.readallbytes(v=vs.110).aspx . – mjwills Jun 10 '18 at 14:10
  • @CamiloTerevinto Thanks ! – Aa Yy Jun 10 '18 at 14:12
  • @mjwills It does generate chunks! Just to mix them together a couple of lines later, though... – Camilo Terevinto Jun 10 '18 at 14:13
  • You do see you are reading from zero each time sum = fs.Read(buffer, 0, buffer.Length); – paparazzo Jun 10 '18 at 14:51
  • @paparazzo The 0 doesn't mean it is reading from position 0 of the _stream_. See https://msdn.microsoft.com/en-us/library/system.io.filestream.read(v=vs.110).aspx and https://stackoverflow.com/a/6865956/34092 . – mjwills Jun 11 '18 at 11:44

1 Answers1

4

The problem is that you keep adding the same buffer of size 4KB to bufferList. That's why the size of the file you receive matches the size on disk (it happens to be rounded to the nearest 4KB in your case).

A bigger problem with your code is that the data you send is wrong, because you keep overwriting the data in the buffer. If, for example, you send 200 chunks, it means that you send 200 copies of the last content of buffer.

The fix is relatively simple - make copies of the buffer before adding to bufferList:

bufferList.Add(buffer.Take(sum).ToArray());

This would fix the size problem, too, because the last chunk would have a smaller size, as represented by sum from the last call. Most importantly, though, bufferList would contain copies of the buffer, rather than the references to the buffer itself.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523