How to download huge files from download link to web api server?

Question

I'm writing a web api action. User will make a call to web api with a download URL. Web API should download the file(of size more than 5GB) from the given link and store it in local disk. User should be able to make new requests while the file from previous request is being downloaded to server.

I implemented it in the following way currently. I would like to know if there is any better approach than this?

public async Task Post(string fileUrl)
{
  WebClient client = new WebClient();
  //diskPath is the local path for the file to be saved
  await client.DownloadFileTaskAsync(new Uri(fileUrl), diskPath);
}

You could try using Thread and ThreadStart objects (System.Threading namespace) to call your downloadfile method, to allow for multiple requests to your Post method while past requests can process independently. Or you could use System.Threading.Tasks.Task.Factory.StartNew() method — Anthony McGrath, May 23 '18 at 00:50
This is fine Async - Await is meant for remote processing, not threading or TPL. This will be a non blocking call as expected, only point is consider chunking the download of such big file and join it back post download. Still you would use Async - Await — Mrinal Kamboj, May 23 '18 at 00:55
@AnthonyMcGrath for across network call Threading or TPL is a bad idea, it will choke the system and make it unresponsive — Mrinal Kamboj, May 23 '18 at 00:56
@ Mrinal Kamboj: could you please share some link where I can see chunking part? I found examples of chunking while downloading file from local disk to server. But my case is to download it from http link. — user3407500, May 23 '18 at 01:25

Amit · Answer 1 · 2018-05-23T07:05:26.217

I implemented it in the following way currently. I would like to know if there is any better approach than this?

i don't see any problem in what you have implemented in terms of your client should be able to do other tasks (here to call other APIs) while downloading is going on.

But as side-note, i would suggest not to download whole file in one go.

if you are doing it in such way, your one and only request will be too much important for you to be successfully responded. As if your downloading is going on and network goes out, whole downloading will be failed.

Instead of this,

(if you not have it already) you make whole content in byte[] (there are multiple ways to do that)
divide whole byte[] into number of chunks (with reasonable size)
Download them one by one (or even parallel if you have liberty with broadband)
Merge them togather at client side
And keep the note till which chunk you have already downloaded data (and keep updating this information with every response)

this way you downloading process will have feature of Pause-Resume, and it will be scalable and robust for fail condition (as after network issue resolves you will need to download only remaining data only)

You can devide and merge byte[] like below here list operations are easy to be done so converting array into list first

//Suppose your main byte[] is array
List<byte> mainInputList = array.ToList();

Now dividing whole thing into multiple chunks.

//making chunks (size of 10)
    List<List<byte>> listOfChunks = new List<List<byte>>();
    List<byte> chunks = new List<byte>();
    for (int i = 0; i < mainInputList.Count; i++)
    {
        chunks.Add(array[i]);
        if (chunks.Count == 10)
        {
            listOfChunks.Add(chunks);
            chunks = new List<byte>();
        }
    }
    if (chunks.Count > 0)
        listOfChunks.Add(chunks);

Now merging them back.

    //Merging Chunks
    List<byte> finalByteList = new List<byte>();
    foreach(List<byte> chunk in listOfChunks)
    {
        finalByteList.AddRange(chunk);
    }
    byte[] finalByteArr = finalByteList.ToArray();

for making byte[] form any object and making object from byte[]:

How to convert byte array to any type

Update

Quoting an important point which is raised in comment section.

Keep in mind:

If you need to chunk downloads they are obviously of such a large size that I wouldn’t want to store them in memory if they finally get written to the disk anyway. Just for each chunk (which should have a fixed or at least max size) you read, seek to the file position where it belongs and write it. This way you can download arbitrarily large files and are not bounded by memory.

If you need to chunk downloads they are obviously of such a large size that I wouldn’t want to store them in memory if they finally get written to the disk anyway. Just for each chunk (which should have a fixed or at least max size) you read, seek to the file position where it belongs and write it. This way you can download arbitrarily large files and are not bounded by memory. — ckuri, May 23 '18 at 06:32
@ckuri of course we should not keep such large data into memory (all the time while downloading is going on). I was just showing how can we split and merge byte[]. Btw as you are making a point (and OP too get confused out of it). should i put your comment as quote in my updated answer! so people will have hint about it too — Amit, May 23 '18 at 06:54

How to download huge files from download link to web api server?

1 Answers1