How to delegate long background tasks from web, and recover control when done

Question

We have an ERP in which twice a month, all the orders in the last two weeks have to be billed. So that our clients select all those orders, press on the "generate bills" button, and a series of sequential ajax http requests are done, once per invoice, while a pop-up message informs them of the process.

First, all the invoices are sequentially generated in the DB, just as mentioned previously, and once this process is done, then it's the turn for the generation of the PDF files. This is also made with sequential ajax requests.

This is good, as long as the users keep that window untouched. It they leave that page or close it, the whole process, which might take a few minutes if there are many invoices to generate, is stopped.

It might lead to many invoices without the PDF file generated if the process is stopped in the middle. This is critical because when they send all those invoices to be printed, this action takes a lot more to get done if the PDF content must be generated on the fly and sent to the printer than if the content is read from an existing file.

I could change the process so that after one invoice is generated, the next action is to generate its file, and so on. But I wonder if there's some way to send the process to background, via system(), exec() or so, and get notified in the same web app when the process is done, regardless of the users decision to leave the billing page to do other tasks.

Well a queue (beanstalkd for example) to generate the PDF's would be a good start and somewhere to store "did a user receive a notification of those generated PDF's" — Daan, Aug 23 '18 at 14:15
Interesting. I will do a research on queues and beanstalkd. Second part, are you talking about databases? — luis.ap.uyen, Aug 24 '18 at 08:17

score 6 · Accepted Answer · answered Aug 28 '18 at 12:04

Such tasks are not suitable for web because they hold your web requests for a longer time and if you are using a server like nodejs the situation becomes really bad following the single threaded model.

Anyways this is one of the simplest way of how things can be done:

Send an ajax request with the list of order ids to server. The server simply inserts these orderids with status PENDING in lets say a dbtable "ORDERINVOICE". The server simply responds with 200 saying request accepted
There is a background job querying the ORDERINVOICE table lets say every 5 secs waiting for records with status PENDING. This job will generate invoice and mark the status as INVOICED
There is another background job querying the ORDERINVOICE table lets say every 5 secs waiting for records with status INVOICED. This job will generate pdf and mark the status as DONE

Now coming to the part of updating the WEB UI.

For real time notifications you will be required to use Websockets which will create a persistent connection to your server enabling bidirectional communication.

However, if you can afford to have a lag about updating clients about progress, another way could be polling after 5/6secs from web ui via an ajax request to return the status of ORDERINVOICE table. Like pending:10, In progress: 20, Done: 3 etc.

Scaling Needs

The above implementation is very simple and can be done w/o using a middleware. However, if you are planning to scale in long run and would want to avoid unnecessary queries to DB, You will have to go full async with some heavy maintenance. (This should be advisable method for a system doing multitude of processing)

Full Async way using Queueing solutions like Kafka/RabbitMQ etc

Step 1 above still remains the same.(To provide persistance storage)
Create a producer which simply reads PENDING records and pushes the orders in an INVOICING QUEUE
Depending on the scale you can add n consumers to this INVOCING QUEUE doing your invoicing work parallely and once done update the status and push the record to another PDFQUEUE.
Again to speed up and scale the process, you will have consumers listening to this PDFQUEUE and doing pdf generation work. Once done they will update the status and push the message to NOTIFYQUEUE.
The websocket server will be our consumer of NOTIFYQUEUE and it will simple update the web browser about done status. You will need to pass a unique user/visitor id for this. Check https://socket.io/ for web sockets.

Thanks a lot! This is what I need. Since it's not an intensive task (just fortnightly), I think there's no need to use queues, nor websockets. I think just with ajax polling will do the trick. — luis.ap.uyen, Aug 28 '18 at 12:43

Mikhail Kulygin · Answer 2 · 2018-08-27T14:17:26.887

I recommend to use some queue service. For example, RabbitMQ for creating the queues for all tasks.

You can create two queues:

First one for generating invoices in DB --> Add items to this queue after client clicked the button "Generate bills". A pop-up message will momentally inform a user about quantity of bills and estimated time of generation after all tasks will be sent to the queue. You do not have to wait until the end of the generation process.
Second one for generation PDF-files. It recieves an item from first queue after successful generation an invoice in DB. A worker (while true process) gets items from this queue, generates a PDF, and marks the item as finished if PDF is created. Otherwise, worker marks the item as not finished and increases the counter of attempts. After max attempts limit is reached worker marks the item as failed and deletes it from second queue.

In result, you can see how many items are generating now. Log unsuccessful generations and controll all process.

A simple example:

SENDER

Create a queue and send an item to it. Start sender process before starting consumer.

$params = array(
    'host' => 'localhost',
    'port' => 5672,
    'vhost' => '/',
    'login' => 'guest',
    'password' => 'guest'
);

$connection = new AMQPConnection($params);
$connection->connect();
$channel = new AMQPChannel($connection);

$exchange = new AMQPExchange($channel);
$exchange->setName('ex_hello');
$exchange->setType(AMQP_EX_TYPE_FANOUT);
$exchange->setFlags(AMQP_IFUNUSED | AMQP_AUTODELETE);
$exchange->declare();

$queue = new AMQPQueue($channel);
$queue->setName('invoice');
// ability to autodelete a queue after script is finished,
// AMQP_DURABLE says you cannot create two queues with same name
$queue->setFlags(AMQP_IFUNUSED | AMQP_AUTODELETE | AMQP_DURABLE); 
$queue->declare();
$queue->bind($exchange->getName(), '');

$result = $exchange->publish(json_encode("Invoice_ID"), '');

if ($result)
    echo 'sent'.PHP_EOL;
else
    echo 'error'.PHP_EOL;
# after sending an item close the connection
$connection->disconnect();

CONSUMER

Worker must connect to RabbitMQ, read the queue, make the job, and set the result:

$params = array(
    'host' => 'localhost',
    'port' => 5672,
    'vhost' => '/',
    'login' => 'guest',
    'password' => 'guest'
);

$connection = new AMQPConnection();
$connection->connect();

$channel = new AMQPChannel($connection);

$exchange = new AMQPExchange($channel);
$exchange->setName('ex_hello');
$exchange->setType(AMQP_EX_TYPE_FANOUT);
$exchange->declare();

$queue = new AMQPQueue($channel);
$queue->setName('invoice');
// ability to autodelete a queue after script is finished,
// AMQP_DURABLE says you cannot create two queues with same name
$queue->setFlags(AMQP_IFUNUSED | AMQP_AUTODELETE | AMQP_DURABLE); 
$queue->declare();
$queue->bind($exchange->getName(), '');

while (true) {
    if ($envelope = $queue->get()) {
        $message = json_decode($envelope->getBody());
        echo "delivery tag: ".$envelope->getDeliveryTag().PHP_EOL;
        if (doWork($message)) {
            $queue->ack($envelope->getDeliveryTag());
        } else {
            // not successful result, we need to redo this job
            $queue->nack($envelope->getDelivaryTag(), AMQP_REQUEUE); 
        }
    }
}

$connection->disconnect();

A little bit of explaining and commenting on/off the code does miracles. Just saying. — Cemal, Aug 27 '18 at 12:25

score 2 · Answer 3 · answered Aug 24 '18 at 08:34

I think you should leverage the benefit of a background task runner. When user clicks on the button then do a one ajax call to backend system to tell that to add new task(multiple task in your case) into your task queue.

yes, you should maintain a task queue for this.This could be a database table which has attributes like task_type,task_data,task_status,task_dependency.

Since you have multiple tasks you could add them as 2 main task

Create all invoices
Generate PDF report(add above task ID as a dependency of this task)

There there should be a worker process to see your task queue and execute them.This process will look for task queue table for a fix time interval(every 1min) then if there are tasks which has the status (0-pending) without other task as dependency which is not yet executed then it will execute them.Task runner will continue this until no task to be executed.

From front-end you could do an Ajax long polling to check weather your pdf generating task status(1-completed).If it is then you can notify user.

For this you can develop your simple task runner(may be from Go,Nodejs) or else you could use available task runners

Thank you! Sounds rather cumbersome, but for now it's the best answer because it approaches better my problem. — luis.ap.uyen, Aug 24 '18 at 08:47

score 1 · Answer 4 · answered Aug 24 '18 at 08:09

1

Most of the background tasks are done by using Cron Jobs. Cron executes in background and run whatever the code is. These can be scheduled to run any time on server side. In your case, you can set it twice per month using following expression:

0 0 1,15 * *  ---Command here---

If not aware of scheduling cron jobs, then this might help.

Now come to point, notifying user after finishing job. You need to store information like start_time, end_time, status in a database table for each cron. In start of cron, status of cron remain 0 and on finish it should change to 1.

User can be notified by using this information from database at any time.

answered Aug 24 '18 at 08:09

Lovepreet Singh

4,792
1
18
36

I'm not sure if cron jobs would really help here, because these are scheduled at fixed times, and the actions I taked about happen anytime, whenever the user needs. – luis.ap.uyen Aug 24 '18 at 08:15
But if want to trigger the process through user interface, then it is mandatory to keep connected till process finishes. Otherwise process terminate in between without completion. – Lovepreet Singh Aug 24 '18 at 18:09

X 47 48 - IR · Answer 5 · 2018-08-29T10:09:42.643

You can execute your PHP scripts at the background processes with this function:

    function Execute($CMD)
    {
        $OS = strtoupper(PHP_OS_FAMILY);

        if ($OS == 'WINDOWS') {
            return pclose(popen("start /B {$CMD}", "r"));
        } else {
            return shell_exec("{$CMD} > /dev/null 2>/dev/null &");
        }
    }

After your command began, Start logging some information about it (Like It is finished | It is being processed)

There are some ways to log information:

Log information in a single file
To use in-memory DBs (Redis)
Or MySQL

While it is being changed you can read progress from that log file/DB

I have done this for my videos when the user uploads a video I need to create tooltip thumbnails, poster, etc using FFMPEG, It takes too much time to wait and I just run my scripts at the background process.

Blacksmith · Answer 6 · 2018-08-29T00:57:08.983

Option 1 : Queue using scheduled cron jobs

The challenge with this is your users have no control over when the cron job will run hence if they miss the window in which the job is scheduled, they can do nothing. It is also resource intensive on the web server if you do the calculations in the application which will still generate the pdfs

A possible solution to this challenge is to send reminders via email, app notifications and/or SMS when you the window to run the run the cron job approaches. Find a balance in distributing the load between your web and database server.

Add a field in your database with a flag e.g requiresProcessing with values that show the current state e.g do nothing, initiate Processing, processing, complete, incomplete etc that indicates what needs to be done to an order.

Once your users select the orders that they want to generate invoices, change the flag to initiate Processing. Notify your users on the interface that the job has been queued for processing, possibly give them the amount of time that it will be take to complete.

Create a php script that will query your db for records that meet this criteria - user has requested invoice/pdf generation.

Setup a cron job on your server that will run when your server is not very busy. This cron job will run the PHP script above. It will retrieve the records(work orders) that need invoices, do computations on the fields that will appear on the invoice and then create the PDF. As each invoice is created and completed, change the flag. If something happens and the invoice is not created, store that state.

On your web application UI, you could have some form of notification(status update) using an AJAX request that shows the progress, assuming your user made the request just before the cron job started and hence can see the update.

OPTION 2 : Client Side using HTML5 webworkers and local storage

Library to generate PDF using JS

This thread my help you with some challenges faced when using web workers

Transfer the load of creating invoices to the client/browser using HTML5/Web workers. The user selects the orders, your app stores a key value pair (uniqueId:state) of the orders in the browser local storage.

Have a web worker that generates the invoices. Remove the uniqueId of orders that have been successfully completed from local storage. Local storage is persistent so even if they close the window that data remains. Once they reopen the window, have a service that checks the local storage and continues generating invoices.

Option 2: Extended As option 2 but develop a browser extension that can perform these tasks even if the user has logged out of your app or the browser.

These two threads should guide you.

Chrome extension: accessing localStorage in content script

Chrome extension that runs even if chrome closed

How to delegate long background tasks from web, and recover control when done

6 Answers6