10

I have a class responsible for downloading files in a download manager. This class is responsible for downloading the file and writing it to the given path.

The size of the files to download varies normally from 1 to 5 MB but could also be much larger. I'm using an instance of the WebClient class to get the file from the internet.

public class DownloadItem
{
    #region Events
    public delegate void DownloadItemDownloadCompletedEventHandler(object sender, DownloadCompletedEventArgs args);

    public event DownloadItemDownloadCompletedEventHandler DownloadItemDownloadCompleted;

    protected virtual void OnDownloadItemDownloadCompleted(DownloadCompletedEventArgs e)
    {
        DownloadItemDownloadCompleted?.Invoke(this, e);
    }

    public delegate void DownloadItemDownloadProgressChangedEventHandler(object sender, DownloadProgressChangedEventArgs args);

    public event DownloadItemDownloadProgressChangedEventHandler DownloadItemDownloadProgressChanged;

    protected virtual void OnDownloadItemDownloadProgressChanged(DownloadProgressChangedEventArgs e)
    {
        DownloadItemDownloadProgressChanged?.Invoke(this, e);
    }
    #endregion

    #region Fields
    private static readonly Logger Logger = LogManager.GetCurrentClassLogger();
    private WebClient _client;
    #endregion

    #region Properties
    public PlaylistItem Item { get; }
    public string SavePath { get; }
    public bool Overwrite { get; }
    #endregion

    public DownloadItem(PlaylistItem item, string savePath, bool overwrite = false)
    {
        Item = item;
        SavePath = savePath;
        Overwrite = overwrite;
    }

    public void StartDownload()
    {
        if (File.Exists(SavePath) && !Overwrite)
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true));
            return;
        }

        OnDownloadItemDownloadProgressChanged(new DownloadProgressChangedEventArgs(1));
        Item.RetreiveDownloadUrl();

        if (string.IsNullOrEmpty(Item.DownloadUrl))
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, new InvalidOperationException("Could not retreive download url")));
            return;
        }

        // GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
        using (_client = new WebClient())
        {
            _client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");

            try
            {
                _client.DownloadDataCompleted +=
                    (sender, args) =>
                    {
                        Task.Run(() =>
                        {
                            DownloadCompleted(args);
                        });
                    };
                _client.DownloadProgressChanged += (sender, args) => OnDownloadItemDownloadProgressChanged(new DownloadProgressChangedEventArgs(args.ProgressPercentage));
                _client.DownloadDataAsync(new Uri(Item.DownloadUrl));
            }
            catch (Exception ex)
            {
                Logger.Warn(ex, "Error downloading track {0}", Item.VideoId);
                OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, ex));
            }
        }
    }

    private void DownloadCompleted(DownloadDataCompletedEventArgs args)
    {
        // _client = null;

        // GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
        // GC.Collect(2, GCCollectionMode.Forced);

        if (args.Cancelled)
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, args.Error));
            return;
        }

        try
        {
            File.WriteAllBytes(SavePath, args.Result);

            using (var file = TagLib.File.Create(SavePath))
            {
                file.Save();
            }

            try
            {
                MusicFormatConverter.M4AToMp3(SavePath);
            }
            catch (Exception)
            {
                // ignored
            }

            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(false));
        }
        catch (Exception ex)
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, ex));
            Logger.Error(ex, "Error writing track file for track {0}", Item.VideoId);
        }
    }

    public void StopDownload()
    {
        _client?.CancelAsync();
    }

    public override int GetHashCode()
    {
        return Item.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        var item = obj as DownloadItem;

        return Item.Equals(item?.Item);
    }
}

Every download causes a very large memory increase compared with the file size of the downloaded item. If I download a file with a size of ~3 MB the memory usage is increasing about 8 MB.

As you can see the download is producing much LOH which is not cleared after the download. Even forcing the GC or the setting GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce; is not helping to prevent this memory leak.

Comparing Snapshot 1 and 2 you can see that the amount of memory is produced by byte arrays which might be the download result.

Doing several downloads shows how terrible this memory leak is.

In my opinion this is caused by the WebClient instance in any way. However I can't really determine what exactly is causing this issue. It doesn't even matters if I force the GC. This screen here shows it without forced gc:

What is causing this overheat and how can I fix it? This is a major bug and imagining 100 or more downloads the process would run out of memory.

Edit


As suggested I commented out the section responsible for setting the tags and converting the M4A to an MP3. However the converter is just a call of FFMPEG so it shouldn't be a memory leak:

class MusicFormatConverter
{
    public static void M4AToMp3(string filePath, bool deleteOriginal = true)
    {
        if(string.IsNullOrEmpty(filePath) || !filePath.EndsWith(".m4a"))
            throw new ArgumentException(nameof(filePath));

        var toolPath = Path.Combine("tools", "ffmpeg.exe");

        var convertedFilePath = filePath.Replace(".m4a", ".mp3");
        File.Delete(convertedFilePath);

        var process = new Process
        {
            StartInfo =
            {
                FileName = toolPath,
#if !DEBUG
                WindowStyle = ProcessWindowStyle.Hidden,
#endif
                Arguments = $"-i \"{filePath}\" -acodec libmp3lame -ab 128k \"{convertedFilePath}\""
            }
        };

        process.Start();
        process.WaitForExit();

        if(!File.Exists(convertedFilePath))
            throw new InvalidOperationException("File was not converted successfully!");

        if(deleteOriginal)
            File.Delete(filePath);
    }
}

The DownloadCompleted() method looks now like this:

private void DownloadCompleted(DownloadDataCompletedEventArgs args)
{
    // _client = null;

    // GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
    // GC.Collect(2, GCCollectionMode.Forced);

    if (args.Cancelled)
    {
        OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, args.Error));
        return;
    }

    try
    {
        File.WriteAllBytes(SavePath, args.Result);

        /*
        using (var file = TagLib.File.Create(SavePath))
        {
            file.Save();
        }

        try
        {
            MusicFormatConverter.M4AToMp3(SavePath);
        }
        catch (Exception)
        {
            // ignore
        }
        */

        OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(false));
    }
    catch (Exception ex)
    {
        OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, ex));
        Logger.Error(ex, "Error writing track file for track {0}", Item.VideoId);
    }
}

The result after downloading 7 items: It seems like this was not the memory leak.

As an addition I'm submitting the DownloadManager class too as it is handling the whole download operation. Maybe this could be the source of the problem.

public class DownloadManager
{
    #region Fields
    private static readonly Logger Logger = LogManager.GetCurrentClassLogger();
    private readonly Queue<DownloadItem> _queue;
    private readonly List<DownloadItem> _activeDownloads;
    private bool _active;
    private Thread _thread;
    #endregion

    #region Construction
    public DownloadManager()
    {
        _queue = new Queue<DownloadItem>();
        _activeDownloads = new List<DownloadItem>();
    }
    #endregion

    #region Methods
    public void AddToQueue(DownloadItem item)
    {
        _queue.Enqueue(item);

        StartManager();
    }

    public void Abort()
    {
        _thread?.Abort();

        _queue.Clear();
        _activeDownloads.Clear();
    }

    private void StartManager()
    {
        if(_active) return;

        _active = true;

        _thread = new Thread(() =>
        {
            try
            {
                while (_queue.Count > 0 && _queue.Peek() != null)
                {
                    DownloadItem();

                    while (_activeDownloads.Count >= Properties.Settings.Default.ParallelDownloads)
                    {
                        Thread.Sleep(10);
                    }
                }

                _active = false;
            }
            catch (ThreadInterruptedException)
            {
                // ignored
            }
        });
        _thread.Start();
    }

    private void DownloadItem()
    {
        if (_activeDownloads.Count >= Properties.Settings.Default.ParallelDownloads) return;

        DownloadItem item;
        try
        {
            item = _queue.Dequeue();
        }
        catch
        {
            return;
        }

        if (item != null)
        {
            item.DownloadItemDownloadCompleted += (sender, args) =>
            {
                if(args.Error != null)
                    Logger.Error(args.Error, "Error downloading track {0}", ((DownloadItem)sender).Item.VideoId);

                _activeDownloads.Remove((DownloadItem) sender);
            };

            _activeDownloads.Add(item);
            Task.Run(() => item.StartDownload());
        }
    }
    #endregion
Christian Klemm
  • 1,455
  • 5
  • 28
  • 49
  • What is your .NET version? From your code, it says: NET CLR 1.0.3705 – Matt Oct 12 '16 at 13:16
  • I'm using .NET Framework 4.5.2 – Christian Klemm Oct 12 '16 at 13:30
  • WebClient does not have a leak. Clearly you ought to be much more concerned about "Taglib" and "MusicFormatConverter", classes that so unlike WebClient are *not* tested millions of times every day. Use a decent memory profiler to get ahead. – Hans Passant Oct 12 '16 at 13:52
  • I edited my post and tried it without the Taglib and MusicFormatConverter part. The result is still the same. – Christian Klemm Oct 12 '16 at 14:04
  • I fear that the WebClient is not disposed properly. That would explain the high overheat. Maybe there is an explanation for this but I have no clue atm. – Christian Klemm Oct 12 '16 at 14:26
  • I believe it's by design. WebClient behaves like this when using the DownloadData and DownloadString methods. Behind the scene it uses a buffer backed by byte arrays that can double its size each time it thinks its needed: https://referencesource.microsoft.com/#System/net/System/Net/_ScatterGatherBuffers.cs,2e60233bb771e7a1,references . There are two solutions: you can use DownloadFile instead wich uses a Stream internally, which is the easiest way, and in the end, that's what you do, or use the OpenRead methods and stream things by yourself and avoid huge byte[] allocations. – Simon Mourier Oct 15 '16 at 06:06
  • @SimonMourier Yesterday I tried the DownloadFileAsync method but it end up in the same situation like the DownloadDataAsync method. It still produces a large overheat compared to the downloaded size. I think I am forced to use the second method you described. – Christian Klemm Oct 15 '16 at 11:24
  • @chris579: We can't see your images in the question. – CharithJ Oct 16 '16 at 22:09
  • @CharithJ Strange, I can see them. However this issue was already fixed. – Christian Klemm Oct 16 '16 at 22:21
  • @HansPassant WebClient has a leak when used asynchronously. See this question: https://stackoverflow.com/questions/53350298/large-unexplained-memory-in-the-memory-dump-of-a-net-process 2OP: thanks for your question and workaround!!! – Alex from Jitbit Feb 07 '21 at 16:12

1 Answers1

5

Finally, after dozens of profilings and memory checking the issue is resolved now.

As @SimonMourier already stated this issue is related to the design of the UploadFile, DownloadData, DownloadString and DownloadFile methods. Looking into the backend of them you can see that all of them are using the private DownloadBits method in the WebClient class with this signature:

private byte[] DownloadBits(WebRequest request, Stream writeStream, CompletionDelegate completionDelegate, AsyncOperation asyncOp)

Regarding the return type it is clear why the behaviour is like I discovered: When using the above mentioned methods the content is saved in a byte array. Therefore it is not recommended to use these methods if the file size is > 85,000 bytes as this would result in filling the LOH until the memory limit is reached. This might not be important if the files are small but with growing size the LOH is also growing by a multiple.

As an addition here my final solution:

public class DownloadItem : DownloadManagerItem
{
    #region Fields

    private static readonly Logger Logger = LogManager.GetCurrentClassLogger();

    private WebClient _webClient;

    #endregion

    #region Properties

    public string SavePath { get; }
    public bool Overwrite { get; }
    public DownloadFormat DownloadFormat { get; }

    #endregion

    public DownloadItem(PlaylistItem item, string savePath, DownloadFormat downloadFormat, bool overwrite = false)
        : base(item)
    {
        SavePath = savePath;
        Overwrite = overwrite;
        DownloadFormat = downloadFormat;
    }

    public override void StartDownload()
    {
        if (File.Exists(SavePath) && !Overwrite)
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true));
            return;
        }

        OnDownloadItemDownloadProgressChanged(new DownloadProgressChangedEventArgs(1));
        Item.RetreiveDownloadUrl();

        if (string.IsNullOrEmpty(Item.DownloadUrl))
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true,
                new InvalidOperationException("Could not retreive download url")));
            return;
        }

        using (_webClient = new WebClient())
        {
            _webClient.Headers.Add("user-agent",
                "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");

            try
            {
                _webClient.OpenReadCompleted += WebClientOnOpenReadCompleted;

                _webClient.OpenReadAsync(new Uri(Item.DownloadUrl));
            }
            catch (Exception ex)
            {
                Logger.Warn(ex, "Error downloading track {0}", Item.VideoId);
                OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, ex));
            }
        }
    }

    private void WebClientOnOpenReadCompleted(object sender, OpenReadCompletedEventArgs openReadCompletedEventArgs)
    {
        _webClient.Dispose();

        if (openReadCompletedEventArgs.Cancelled)
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, openReadCompletedEventArgs.Error));
            return;
        }

        if (!Overwrite && File.Exists(SavePath))
            return;

        var totalLength = 0;
        try
        {
            totalLength = int.Parse(((WebClient)sender).ResponseHeaders["Content-Length"]);
        }
        catch (Exception)
        {
            // ignored
        }

        try
        {
            long processed = 0;
            var tmpPath = Path.GetTempFileName();

            using (var stream = openReadCompletedEventArgs.Result)
            using (var fs = File.Create(tmpPath))
            {
                var buffer = new byte[16 * 1024];
                int read;

                while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    fs.Write(buffer, 0, read);

                    processed += read;
                    OnDownloadItemDownloadProgressChanged(new DownloadProgressChangedEventArgs(processed, totalLength));
                }
            }

            File.Move(tmpPath, SavePath);

            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(false));
        }
        catch (Exception ex)
        {
            OnDownloadItemDownloadCompleted(new DownloadCompletedEventArgs(true, ex));
        }
    }

    public override void StopDownload()
    {
        _webClient?.CancelAsync();
    }

    public override void Dispose()
    {
        _webClient?.Dispose();
    }

    public override int GetHashCode()
    {
        return Item.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        var item = obj as DownloadItem;

        return Item.Equals(item?.Item);
    }
}

However thanks for the help!

Christian Klemm
  • 1,455
  • 5
  • 28
  • 49
  • Could it also be a solution to use the .Net `BufferedStream` class ([MS docs](https://learn.microsoft.com/en-us/dotnet/api/system.io.bufferedstream?view=net-7.0)) for reading in chunks to bypass the LOH? – Maddin Jan 12 '23 at 15:41
  • @Maddin using any Stream is fine since they have an underlying buffer that fits into the memory size constraints anyways. However, if you acquire a stream from an `HttpClient` you cannot control what `Stream` implementation you get anyways. – Christian Klemm Jan 13 '23 at 16:11
  • Thanks for clarification. Shouldn't it be possible to create a `BufferedStream` from the `Stream` return by `WebClient.OpenReadAsync`? The ctor takes a stream. – Maddin Jan 16 '23 at 10:21
  • 1
    @Maddin that would be quite pointless since you then would have two buffers that need to be allocated. The returning stream already contains a buffer and a buffered stream contains one as well. Don't bother what implementation of `Stream` you are working with, it's only important that you work with streams rather than big byte arrays. And you really should use `HttpClient` instead of `WebClient`. – Christian Klemm Jan 17 '23 at 17:07