1

Here's the deal:

I have files on remote server that ought to be accessed by different threads within my application. The former developer used to upload them thousand of times and I decided to change it slightly.

This is a fully operational example that mimics the file loaded I have so far:

public class FileLoader<T>
{
    private object locker = new object();
    /// <summary>
    /// Key - Path to the file / Value - Task that is loading the file by the path
    /// </summary>
    private ConcurrentDictionary<string, Task<T>> fileDownloads = new ConcurrentDictionary<string, Task<T>>();

    // counters for the test
    private int countOfDownloadsStarted = 0;
    public int CountOfDownloadsStarted => countOfDownloadsStarted;

    private int countOfAwaitings = 0;
    public int CountOfAwaitings => countOfAwaitings;

    public async Task<T> GetTileFromFile(string path)
    {
        var exist = fileDownloads.TryGetValue(path, out Task<T> download);

        if (!exist)
        {
            Monitor.Enter(locker);

            exist = fileDownloads.TryGetValue(path, out download);

            if (!exist)
            {
                Interlocked.Increment(ref countOfDownloadsStarted);

                download = Task.Run(() =>
                {
                    // mimic file loading by a given path
                    Thread.Sleep(1000);
                    return default(T);
                });

                fileDownloads.TryAdd(path, download);
            }

            Monitor.Exit(locker);
        }

        Interlocked.Increment(ref countOfAwaitings);
        return await download;
    }
}

And a program class with entry point for you to test it by your own:

class Program
{
    static void Main(string[] args)
    {
        for (int i = 0; i < 10; i++)
        {
            Console.WriteLine($"Test: {i + 1}");
            Foo();
        }
        Console.ReadLine();
    }

    static async Task Foo()
    {
        var paths = new List<string>();

        // creating fake paths 
        var pathsToReplicate = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

        for (int i = 0; i < pathsToReplicate.Length; i++)
        {
            for (int j = 0; j < pathsToReplicate[i]; j++)
            {
                paths.Add(pathsToReplicate[i].ToString());
            }
        }

        var fileLoader = new FileLoader<bool>();

        try
        {
            Parallel.ForEach(paths, async (p) => await fileLoader.GetTileFromFile(p));
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error: {ex.GetBaseException().Message}");
        }

        Console.WriteLine($"Count of downloads matches: {fileLoader.CountOfDownloadsStarted == pathsToReplicate.Length}");
        Console.WriteLine($"Count of awaitings matches: {fileLoader.CountOfAwaitings == pathsToReplicate.Sum(p => p)}");
    }
}

If you launch it as it is - you'll pass the test ten times and the numbers will match. But the problem here is that I use the same lock object even when the paths that are processing are different. It obviously causes redundant context switching, hence the resources wasting.

Firstly I thought that it would be okay to use the paths themselves as they're strings and they're immutable. But then I realized that it's not a good idea because of the same reason. Someone outside can easily lock the same object.

So here I am, looking for an elegant solution that, I am sure, lies on the surface.

Gleb
  • 1,723
  • 1
  • 11
  • 24
  • 2
    First of all while it's clear what is your concern regrading current implementation it's not clear whether this cause any real problem of the application functioning. It can be that it's worth to leave it as is if this single lock doesn't cause any noticeable impact. But anyway if you want to split synchronization process between different files you'd need to make a decomposition in your classes to have a class which would represent a file specific operation instead of single class which handles all files. Concurrent operations per file can be synchronized by file operation class lock then. – Dmytro Mukalov Dec 19 '19 at 19:59
  • In your place I wouldn't be concerned so much about the potential performance problems due to contention caused by the single lock. I would be more concerned about the potential correctness issues caused by mixing `Interlocked` operations with locks, by not guaranteed invocation of `Monitor.Exit`, and by calling `Parallel.ForEach` [with async delegate](https://stackoverflow.com/questions/59412905/parallel-foreach-not-awaiting-for-httpclient). – Theodor Zoulias Dec 19 '19 at 20:32
  • @TheodorZoulias Thank you for your comment. As for Parallel.ForEach, it's here just for example. It's obviously that I don't use it in my production code. As for Monitor.Enter and Monitor.Exit well I oversimplified things for the example and I sure manage these things in my production code. But if you trying to make a hint that it would be better to use lock here instead of Monitor's explicitly then you're wrong as I don't want my coming threads to access the corrupt data in case of exception. – Gleb Dec 19 '19 at 20:38
  • OK Gleb. My comment was of topic anyway, since this is not the [correct site](https://codereview.stackexchange.com/) to perform code reviews. :-) – Theodor Zoulias Dec 19 '19 at 20:43

0 Answers0