3

I have a simple web application module which basically accepts requests to save a zip file on PageLoad from a mobile client app.

Now, What I want to do is to unzip the file and read the file inside it and process it further..including making entries into a database.

Update: the zip file and its contents will be fairly smaller in size so the server shouldn't be burdened with much load.

Update 2: I just read about when IIS queues requests (at global/app level). So does that mean that I don't need to implement complex request handling mechanism and the IIS can take care of the app by itself?

Update 3: I am looking for offloading the processing of the downloaded zip not only for the sake of minimizing the overhead (in terms of performance) but also in order to avoid the problem of table-locking when the file is processed and records updated into the same table. In the scenario of multiple devices requesting the page and the background task processing database updateing in parallel would cause an exception.

As of now I have zeroed on two solutions:

  • To implement a concurrent/message queue
  • To implement the file processing code into a separate tool and schedule a job on the server to check for non-processed file(s) and process them serially.

Inclined towards a Queuing Mechanism I will try to implement is as it seems less dependent on config. v/s manually configuring the job/schedule at the server side.

So, what do you guys recommend me for this purpose?

Moreover after the zip file is requested and saved on server side, the client & server side connection is released after doing so. Not looking to burden my IIS.

Imagine a couple of hundred clients simultaneously requesting the page..

I actually haven't used neither of them before so any samples or how-to's will be more appreciated.

Community
  • 1
  • 1
beerBear
  • 969
  • 2
  • 17
  • 41
  • What should happen if your server crashes while processing? Should it (i) resume processing on restart or (ii) lose any work queued or in progress? – Ian Mercer Feb 04 '13 at 07:29
  • @IanMercer That would be (i). As soon as the zip file is saved it should unzip the contents..start reading the file..make entries acc. to the file. in **one go** that's what I am after. Loosing data/tasks being queued up is not an option. :| – beerBear Feb 04 '13 at 07:34
  • Absent an infinite number of infinitely fast cores that's not possible. You will need to queue the request in a persistent queue (database or database-backed queue) and handle the processing in a separate process if you want to get close to your goal of no data loss. – Ian Mercer Feb 04 '13 at 07:54

3 Answers3

4

I'd recommend TPL and Rx Extensions: you make your unzipped file list an observable collection and for each item start a new task asynchronously.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
  • Hi :) But starting a new `async task` would mean starting a new process. So, suppose if I have 50 clients requesting my page at a moment == 50 requests to save file == 50 different async task processes. – beerBear Feb 04 '13 at 07:53
  • 1
    @codebreaker: each task will be a new thread, theoretically, but actually all will happen on the thread pool, just without your manual control. TPL/Rx is pretty smart enough! – abatishchev Feb 04 '13 at 07:55
  • 1
    I agree with this approach, with Rx you can do all sorts of stuff, like retrying, with an easy, fluent API. Pluralsight has a *GREAT* Rx course: [.NET Reactive Extensions Fundamentals](http://pluralsight.com/training/courses/TableOfContents?courseName=reactive-extensions), or if you don't have a sub., check out [this talk](http://vimeo.com/43659034) by Paul Betts. – khellang Feb 04 '13 at 08:38
1

I'd suggest a queue system.

When you received a file you'll save the path into a thread-synchronized queue. Meanwhile a background worker (or preferably another machine) will check this queue for new files and dequeue the entry to handle it.

This way you won't launch an unknown amount of threads (every zip file) and can handle the zip files in one location. This way you can also easier move your zip-handling code to another machine when the load gets too heavy. You just need to access a common queue.

The easiest would probably be to use a static Queue with a lock-object. It is the easiest to implement and does not require external resources. But this will result in the queue being lost when your application recycles.

You mentioned losing zip files was not an option, then this approach is not the best if you don't want to rely on external resources. Depending on your load it may be worth to utilize external resources - meaning upload the zip file to a common storage on another machine and add a message to an queue on another machine.

Here's an example with a local queue:

ConcurrentQueue<string> queue = new ConcurrentQueue<string>();

void GotNewZip(string pathToZip)
{
    queue.Enqueue(pathToZip); // Added a new work item to the queue
}

void MethodCalledByWorker()
{
    while (true)
    {
        if (queue.IsEmpty)
        {
            // Supposedly no work to be done, wait a few seconds and check again (new iteration)
            Thread.Sleep(TimeSpan.FromSeconds(5));
            continue;
        }

        string pathToZip;
        if (queue.TryDequeue(out pathToZip)) // If TryDeqeue returns false, another thread dequeue the last element already
        {
            HandleZipFile(pathToZip);
        }
    }
}

This is a very rough example. Whenever a zip arrives, you add the path to the queue. Meanwhile a background worker (or multiple, the example s threadsafe) will handle one zip after another, getting the paths from the queue. The zip files will be handled in the order they arrive.

You need to make sure that your application does not recycle meanwhile. But that's the case with all resources you have on the local machine, they'll be lost when your machine crashes.

  • Hi, Thanks for replying fast :) But how would I send the file's path to the message queue? Will that has to be executed using a thread too? A sample could help/any links where I can look into? +1 nice approach – beerBear Feb 04 '13 at 07:24
  • That depends on what kind of queue you use. The easiest would be probably to use the `Queue`-class of .NET together with a lock-`object`. More "professional" (and might overkill in your case, I don't know) would be an external queue system on another machine. Accessing these queues of course depends on their libraries. You could theoretically also use a database, but I don't know how you'd synchronize your requests there to make it threadsafe. – JustAnotherUserYouMayKnow Feb 04 '13 at 07:28
  • I am trying to minimize overhead so really won't go in for multi-tier approach. **UPDATE: ** There certainly won't be more than a 9xx devices trying to request the page so.. :| Please my comment #1 above (under my ques.) – beerBear Feb 04 '13 at 07:36
  • I guess a local Queue would still be an option. Just set the worker(s) to barely sleep any time, even when the queue is empty. You'd still need to make sure yourself that the machine does not recycle. But using this approach (instead of starting a new task right away when a zip arrived) it will have fewer file loss when the machine abruptly disconnects, because one file will be finished after another. – JustAnotherUserYouMayKnow Feb 04 '13 at 07:39
  • 1
    BTW, you know that .NET 4 has `ConcurrentQueue`? – khellang Feb 04 '13 at 07:55
  • @KristianHellang Just saw [this](http://stackoverflow.com/questions/4551087/how-to-work-threading-with-concurrentqueuet) +1 However can you include an example relative to my case? – beerBear Feb 04 '13 at 07:59
  • Edited my example to use `ConcurrentQueue` instead of `Queue`. Looks a bit cleaner without the messy `lock`s. – JustAnotherUserYouMayKnow Feb 04 '13 at 08:10
  • Yes, IIS queues your requests - but when the queue is full then incoming requests will be declined. I don't know if that is what you want. For an easy solution that's probably the best way for you. A more professional approach includes external storage for your zips, an external message queue and several worker than will handle your zips. That way your systems are decoupled and you could scale out your zip handling as needed. But this might be overkill for you. – JustAnotherUserYouMayKnow Feb 04 '13 at 09:18
  • @JustAnotherUserYouMayKnow i). if the request is declined then I will probably get a 503 as a response from the server? If Yes, then its fine with me as the client app is processing it accordingly (so no issues) ii). Yeah, a more distributed arch. is what I am not going to implement/try as it'll be an overkill for a small sized zip file handler app. I will try the `Message Queue`. Also can you recommend a non- .net Framework v.4.0 Method (concurrent queue is only in 4.0)? I am having access to only 2.0, 3.0 and 4.0 versions. :( – beerBear Feb 04 '13 at 10:01
0

I believe you are optimising prematurely.

You mentioned table-locking - what kind of db are you using? If you add new rows or update existing ones most modern databases in most configurations will:

  1. use row-level locking; and
  2. be fast enough without you needing to worry about locking.

I suggest starting with a simple method

        //Unzip
        //Do work
        //Save results to database

and get some proof it's too slow.

tymtam
  • 31,798
  • 8
  • 86
  • 126