-1

I am working on a downloader than is able to split and download a file into multiple parts, but currently there is an issue where if I try any more than 2 pieces the output file is corrupted. I have no idea whats going on, but I think it may be occurring during the part where it is finding individual piece sizes.

There is quite a bit of code so I decided to post it off site. The code can be found here: Github

Any help at all in this issue is VERY much appreciated. Thank you!

Adam Forbis
  • 461
  • 3
  • 11
0xSingularity
  • 577
  • 6
  • 36
  • I run the program and I received error in the line that executes `fs.Position = Start;` (line 69). Was this what you encountered? – Ian Jan 12 '16 at 04:43
  • +Ian No. When I ran the program I didn't get any errors. But every time I would try to download the images with more than 2 parts they would end up becoming corrupt. – 0xSingularity Jan 12 '16 at 17:48
  • "Trying to download with more than two parts" how do you do this progammatically? – Ian Jan 12 '16 at 23:49

1 Answers1

1

Ok, so you have a number of problems in your code, and the real problem isn't that you are breaking the file into more than 2 parts, it is that when you break it into 3 or more parts that it exposes a race condition in your code.

1) You are attempting to write to the same file from multiple threads opening the file for append. Each time you open the file, the end is a moving target.

2) Even after fixing the problems centered around the file, GetResponseAsync deadlocks, so I switched to HttpClient instead which works async with no deadlocking issues.

3) Applied the KISS principal and simplified your code. It still downloads pieces of the .jpg file though I think you are making a big assumption thinking this will be faster than just downloading the entire file in one request. I would test to make sure because your program would be much simpler without the file chunking.

4) One more thing I forgot to mention is that you can't have an async entrypoint in a command line app, so I added the Task.WaitAll call to fix the possible deadlock you might get if you don't. Read [Why is AsyncContext needed when using async/await with a console application? for more details.

This code works every time, doesn't crash, and you can do however many chunks you want. You owe me a beer :-).

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading.Tasks;

namespace OctaneDownloadEngine
{
    class Program
    {
        static void Main()
        {
            try
            {
                // have to use this because you can't have async entrypoint
                Task.WaitAll(SplitDownload("http://www.hdwallpapers.in/walls/tree_snake_hd-wide.jpg", @"c:\temp\output.jpg"));
            }
            catch (Exception ex)
            {
                Console.Error.WriteLine(ex);
                throw;
            }

            Console.ReadLine();
        }

        public static async Task<string> SplitDownload(string URL, string OUT)
        {
            var responseLength = WebRequest.Create(URL).GetResponse().ContentLength;
            var partSize = (long)Math.Floor(responseLength / 4.00);

            Console.WriteLine(responseLength.ToString(CultureInfo.InvariantCulture) + " TOTAL SIZE");
            Console.WriteLine(partSize.ToString(CultureInfo.InvariantCulture) + " PART SIZE" + "\n");
            var previous = 0;

            var fs = new FileStream(OUT, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None, (int)partSize);
            try
            {
                fs.SetLength(responseLength);

                List<Tuple<Task<byte[]>, int, int>> asyncTasks = new List<Tuple<Task<byte[]>, int, int>>();

                for (var i = (int)partSize; i <= responseLength + partSize; i = (i + (int)partSize) + 1)
                {
                    var previous2 = previous;
                    var i2 = i;

                    // GetResponseAsync deadlocks for some reason so switched to HttpClient instead
                    HttpClient client =  new HttpClient() { MaxResponseContentBufferSize = 1000000 };
                    client.DefaultRequestHeaders.Range = new RangeHeaderValue(previous2, i2);
                    byte[] urlContents = await client.GetByteArrayAsync(URL);

                    // start each download task and keep track of them for later
                    Console.WriteLine("start {0},{1}", previous2, i2);

                    var downloadTask = client.GetByteArrayAsync(URL);
                    asyncTasks.Add(new Tuple<Task<byte[]>, int, int>(downloadTask, previous2, i2));

                    previous = i2;
                }

                // now that all the downloads are started, we can await the results
                // loop through looking for a completed task in case they complete out of order
                while (asyncTasks.Count > 0)
                {
                    Tuple<Task<byte[]>, int, int> completedTask = null;
                    foreach (var task in asyncTasks)
                    {
                        // as each task completes write the data to the file
                        if (task.Item1.IsCompleted)
                        {
                            Console.WriteLine("await {0},{1}", task.Item2, task.Item3);
                            var array = await task.Item1;

                            Console.WriteLine("write to file {0},{1}", task.Item2, task.Item3);
                            fs.Position = task.Item2;

                            foreach (byte x in array)
                            {
                                if (fs.Position != task.Item3)
                                {
                                    fs.WriteByte(x);
                                }
                            }
                            completedTask = task;
                            break;
                        }
                    }
                    asyncTasks.Remove(completedTask);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
            finally
            {
                Console.WriteLine("close file");
                fs.Close();
            }
            return OUT;
        }
    }
}
Community
  • 1
  • 1
  • "If you take the sleep out, the file will be written in random order of chunks" Wouldn't the fs.Position = Start set the position where the chuck is written to in the file? So how would this be writing out of order without the sleep? – 0xSingularity Jan 13 '16 at 18:08
  • I revised my answer and posted a new solution. I agree fs.Position should set the position, but it seemed in my testing that it wasn't working that way. There were bigger issues to solve such as the fact that GetResponseAsync would deadlock the process without the sleep so my answer before wasn't going to get you 100% of the way. I think you can take my new answer and run with it. – Tom McAnnally Jan 13 '16 at 21:13
  • Thanks for your help Tom that did the trick! Do you have any resources where I can learn more about async/await? – 0xSingularity Jan 13 '16 at 22:02
  • Maybe check out this book if you like books... It seems to be the authority. http://shop.oreilly.com/product/0636920030171.do?cmp=af-code-books-video-product_cj_0636920030171_7489747 – Tom McAnnally Jan 14 '16 at 03:07