Using threading to generate an expensive file and handle other requests for that file

Question

I am writing a web service that generates, caches and serves zip files.

If a requested file doesn't exist in the cache, it is generated and then served. Depending on the request, it can take quite some time to generate this file. It is possible for another request for the same zip file to come in as it is still being generated on the first request.

A basic scenario might go like this

thread 1: Give me bigfile.zip
thread 1: bigfile.zip doesn't exist
thread 1: Generating bigfile.zip
thread 2: Give me bigfile.zip
thread 2: Thread 1 is generating bigfile.zip - wait for it to finish
thread 1: Finished generating bigfile.zip
thread 1: Serving bigfile.zip
thread 2: Serving bigfile.zip

So I am considering using a Thread to achieve this and using Join() to synchronise the them once the file is ready.

But here I have a problem. How would I go about managing several requests for several different files? I was thinking of using a Dictionary<fileId, Thread> to keep track of them, but then how could I safely remove a thread from the dictionary when it has finished its process? I can't see any way of doing it without putting a lock around the whole thing - including the actual process itself. Of course, doing that would seem to make the whole idea of threading redundant in the first place.

lock(_myLocker)
{
    if(!fileThreads.containsKey(fileId))
    {
        Thread myThread = MakeMeAThread();
        fileThreads.add(fileId, myThread);
    }
    fileThreads[fileId].Join();    
    //We have to do the Join inside the lock, this is the only way we know (in a threadsafe manner) that the dictionary definitely contains our key
}
ServeTheFile();
//How do I clean up the no longer required fileThreads[fileId]?

To add to the difficulty, there is another way of consuming the service that simply tells the client the status of the file being requested (unavailable (404), being generated, ready).

thread 1: Give me bigfile.zip
thread 1: bigfile.zip doesn't exist
thread 1: Generating bigfile.zip
thread 2: Give me bigfile.zip
thread 2: Thread 1 is generating bigfile.zip - wait for it to finish
thread 3: Do you have bigfile.zip? - No, it's being generated
thread 1: Finished generating bigfile.zip
thread 1: Serving bigfile.zip
thread 2: Serving bigfile.zip
thread 4: Do you have bigfile.zip? Yes, it's ready for you
thread 5: Do you have invalid.zip? No, that's an invalid request

So, can you see why we can't just put a lock around the process? If we did, Thread 3 couldn't be told that the file is being generated and would have to wait for the file generation to finish.

Make the thread remove itself from the dictionary when it is done. Better yet, use Task. That way you can attach a continuation. — usr, Jun 23 '14 at 10:40
after writing my answer (see below) i've just re-read your whole question. Your very last sentence points out, that you avoid locks because there's a potential wait if a requested file is currently generated. My question to this is: What should happen in the scenario mentioned above? Do you want the thread to return the information, that the file is being generated, to the client again and the client has to poll until the file is available? — AcidJunkie, Jun 23 '14 at 10:40
So instead of using `lock`, use [Monitor.TryEnter](http://msdn.microsoft.com/en-us/library/system.threading.monitor.tryenter(v=vs.110).aspx), etc. If you're doing this in ASP.NET, then the thing that generates and caches files should probably be a Windows service rather than a thread in the ASP.NET context. — Jim Mischel, Jun 23 '14 at 15:51
@AcidJunkie, yes, the client would have to poll until the file is available. As it's a web application the default method is just to request the file and wait. But when we add javascript, we can use ajax to poll for file availability. — Iain Fraser, Jun 23 '14 at 23:18
Hi @JimMischel. Unfortunately, not allowed to use windows services in this instance. — Iain Fraser, Jun 24 '14 at 00:37
You may very well be able to do this with a [ConcurrentDictionary](http://msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx) — Jim Mischel, Jun 24 '14 at 03:05
Thanks to JimMischel and usr I'm thinking the solution might be to use a ConcurrentDictionary and Tasks. Tasks rather than Threads because of what appear to be their inherent advantages in this case (http://stackoverflow.com/questions/4130194/what-is-the-difference-between-task-and-thread) — Iain Fraser, Jun 24 '14 at 03:31

score 0 · Answer 1 · answered Jun 23 '14 at 10:03

0

This is a very simple solution:
Let's assume, the identifier for a file is it's name. What you could do is to create a dictionary holding the lock objects. e.g:

Dictionary<string, object> _fileLocks = new Dictionary<string, object>();

So when a request for a file generation comes in, you first lock the dictionary object. Then you check if it already contains a lock object. If not, add one. Otherwise get the current one.

object lockObject;
lock (_fileLocks)
{
    if (_fileLocks.TryGetValue(fileName, out lockObject) == false)
    {
        lockObject = new object();
        _fileLocks.Add(fileName, lockObject);
    }
}

Then lock the lock object and perform your work:

lock (lockObject)
{
    // check if the file has been created. if not, generate it
    // load and return the file
}

This way, other requests for the same file will automatically wait until it is generated.

answered Jun 23 '14 at 10:03

AcidJunkie

1,878
18
21

Not a complete solution. You need to add a check for when the item is deleted from the dictionary, or you will fall between phase 1 and 2 – Erez Robinson Jun 23 '14 at 10:16
This solution doesn't remove anything from the dictionary. Or do i misunderstand what you're pointing to? – AcidJunkie Jun 23 '14 at 10:22
The question also pointed out that because this is a cache scheme, files will be removed eventually. And also you did not point to where you remove the file lock. – Erez Robinson Jun 23 '14 at 10:29
it doesn't need to. when the code hits the 2nd code block, it checks if the file exists. if not, it will be generated again. The dictionary object only contains the corresponding lock objects for every particular file name – AcidJunkie Jun 23 '14 at 10:38
1

I'm pointing this a second time, your solution only adds locks never removes them. When do you remove from _fileLocks? and what happens if you are just before lock(lockObject) and someone removes the lock from _fileLocks? It's not that your direction is not good, it's just not complete. – Erez Robinson Jun 23 '14 at 11:03
:) again. you don't remove any lock object from _fileLocks. Why should you? – AcidJunkie Jun 23 '14 at 11:28
1

Why should you? Because otherwise your dictionary could fill up with locks that aren't ever going to be used again. – Jim Mischel Jun 23 '14 at 15:53
Hi @AcidJunkie. Thank you very much for your answer, but as the others are pointing out it isn't a solution as much as it's very succinctly summarising the problem I actually have. I want to be able to safely perform CRUD operations on this dictionary in a multi-threaded environment. I can't see any way to do it that doesn't involve doing everything inside a lock (making multi-threading pointless) or creating a dictionary that could get as large as there are possible combinations of valid files to request. – Iain Fraser Jun 23 '14 at 23:59

Using threading to generate an expensive file and handle other requests for that file

1 Answers1