Efficient and user-friendly way to present slow-loading results

Question

I have read many similar questions concerning cancelling a POST request with jQuery, but none seem to be close to mine.

I have your everyday form that has a PHP-page as an action:

<form action="results.php">
  <input name="my-input" type="text">
  <input type="submit" value="submit">
</form>

Processing results.php on the server-side, based on the post information given in the form, takes a long time (30 seconds or even more and we expect an increase because our search space will increase as well in the coming weeks). We are accessing a Basex server (version 7.9, not upgradable) that contains all the data. A user-generated XPath code is submitted in a form, and the action url then sends the XPath code to the Basex server which returns the results. From a usability perspective, I already show a "loading" screen so users at least know that the results are being generated:

$("form").submit(function() {
  $("#overlay").show();
});

<div id="overlay"><p>Results are being generated</p></div>

However, I would also want to give users the option to press a button to cancel the request and cancel the request when a user closes the page. Note that in the former case (on button click) this also means that the user should stay on the same page, can edit their input, and immediately re-submit their request. It is paramount that when they cancel the request, they can also immediately resend it: the server should really abort, and not finish the query before being able to process a new query.

I figured something like this:

$("form").submit(function() {
  $("#overlay").show();
});
$("#overlay button").click(abortRequest);
$(window).unload(abortRequest);

function abortRequest() {
  // abort correct request
}

<div id="overlay">
  <p>Results are being generated</p>
  <button>Cancel</button>
</div>

But as you can see, I am not entirely sure how to fill in abortRequest to make sure the post request is aborted, and terminated, so that a new query can be sent. Please fill in the blanks! Or would I need to .preventDefault() the form submission and instead do an ajax() call from jQuery?

As I said I also want to stop the process server-side, and from what I read I need exit() for this. But how can I exit another PHP function? For example, let's say that in results.php I have a processing script and I need to exit that script, would I do something like this?

<?php
  if (isset($_POST['my-input'])) {
    $input = $_POST['my-input'];
    function processData() {
      // A lot of processing
    }
    processData()
  }

  if (isset($_POST['terminate'])) {
    function terminateProcess() {
      // exit processData()
    }
  }

and then do a new ajax request when I need to terminate the process?

$("#overlay button").click(abortRequest);
$(window).unload(abortRequest);

function abortRequest() {
  $.ajax({
    url: 'results.php',
    data: {terminate: true},
    type: 'post',
    success: function() {alert("terminated");});
  });
}

I did some more research and I found this answer. It mentions connection_aborted() and also session_write_close() and I'm not entirely sure which is useful for me. I do use SESSION variables, but I don't need to write away values when the process is cancelled (though I would like to keep the SESSION variables active).

Would this be the way? And if so, how do I make one PHP function terminate the other?

I have also read into Websockets and it seems something that could work, but I don't like the hassle of setting up a Websocket server as this would require me to contact our IT guy who requires extensive testing on new packages. I'd rather keep it to PHP and JS, without third party libraries other than jQuery.

Considering most comments and answers suggest that what I want is not possible, I am also interested to hear alternatives. The first thing that comes to mind is paged Ajax calls (similar to many web pages that serve search results, images, what-have-you in an infinite scroll). A user is served a page with the X first results (e.g. 20), and when they click a button "show next 20 results" those are shown are appended. This process can continue until all results are shown. Because it is useful for users to get all results, I will also provide a "download all results" option. This will then take very long as well, but at least users should be able to go through the first results on the page itself. (The download button should thus not disrupt the Ajax paged loads.) It's just an idea, but I hope it gives some of you some inspiration.

As far as I'm aware there's no way to cancel a request once the server starts processing it (I'd happily be proven wrong though!). I can think of some out of the box solutions but they wouldn't be worth it for a 15 second process. You could try looking at web sockets rather than ajax. — jxmallett, Apr 19 '16 at 11:02
This happens automatically. PHP kills the code once the user cancels it. But, if not, you can read http://stackoverflow.com/a/16592945/2729937 — Ismael Miguel, Apr 19 '16 at 11:51
What kind of poorly written backend takes 30 seconds to process? You're trying to solve front-end, but you have a back-end problem. Plain and simple. That said, you need a middleware: you submit a request, it is given an ID and all useful information (IP address comes to mind, an on-the-fly generated cookie, anything that helps retrieveing the request's origin), and only then the query is launched. — Christian Bonato, Apr 21 '16 at 00:09
@BramVanroy — Sorry, that sounded a little harsh. Still, isn't there something that can be done in terms of pre-processing? Some ways to dispatch those 50 millions tokens ? My guess is, it's going to be faster to query 4 or 5 BaseX concurrently, instead of only one. Surely there must be some kind of dispatchable hierarchy in 50 millions entries... — Christian Bonato, Apr 21 '16 at 10:45
Christian Bonato is right, 30 seconds smells bad, especially since you are expecting it to go up. In a commercial, public-facing environment, such a delay would be unacceptable. You would need to consider sharding your database, or change the database engine altogether (XML is definitely not the best solution for such a large volume of data). Now I understand that this is a CS research project, so the expectations are not the same. — RandomSeed, Apr 24 '16 at 13:42
@RandomSeed XML is a heavy-weight, we know that. However, in linguistic data, corpora are almost always delivered as XML, with specific tags and attributes for words and word parts. It would take too much time to convert all these files to another format (what would you suggest?). BaseX is a great XML database, and I have yet to read about a more promising XML back-end. Suggestions are welcome, but we've put quite some time into this and - even though we are not programmers ourselves, not by education at least - have not come to a better alternative. (Read about our efforts in paper above.) — Bram Vanroy, Apr 25 '16 at 07:25
I am afraid you fell in a common pitfall. XML is a great format for data *exchange*, not for data *storage*, and even less for data *querying*. Sure, XPath is surprisingly efficient on small to medium data sets, but for this industrial volume of data, you may consider first converting your data to a regular database engine (any traditional RDBMS would do). I like your project, I'd be interested in contributing, in case you are recruiting ;) — RandomSeed, Apr 25 '16 at 22:42
Another less dramatic option could be splitting your data into several smaller files. Querying these files could be done in parallel on different physical servers, or even on the same machine (so as to use multiple CPU's). This is called [sharding](https://en.wikipedia.org/wiki/Shard_(database_architecture)), and it doesn't look like BaseX supports it natively, this should be done manually. Meh. — RandomSeed, Apr 25 '16 at 22:48

Himel Nag Rana · Answer 1 · 2016-04-19T12:44:43.187

On my understanding the key points are:

You cannot cancel a specific request if a form is submitted. Reasons are on client side you don't have anything so that you can identify the states of a form request (if it is posted, if it is processing, etc.). So only way to cancel it is to reset the $_POST variables and/or refresh the page. So connection will be broken and the previous request will not be completed.
On your alternative solution when you are sending another Ajax call with {terminate: true} the result.php can stop processing with a simple die(). But as it will be an async call -- you cannot map it with the previous form submit. So this will not practically work.
Probable solution: submit the form with Ajax. With jQuery ajax you will have an xhr object which you can abort() upon window unload.

UPDATE (upon the comment):

A synchronous request is when your page will block (all user actions) until the result is ready. Pressing a submit button in the form - do a synchronous call to server by submitting the form - by definition [https://www.w3.org/TR/html-markup/button.submit.html].
Now when user has pressed submit button the connection from browser to server is synchronous - so it will not be hampered until the result is there. So when other calls to server is made - during the submit process is going on - no reference of this operation is available for others - as it is not finished. It is the reason why sending termination call with Ajax will not work.
Thirdly: for your case you can consider the following code example:

HTML:

<form action="results.php">
  <input name="my-input" type="text">
  <input id="resultMaker" type="button" value="submit">
</form>

<div id="overlay">
  <p>Results are being generated</p>
  <button>Cancel</button>
</div>

JQUERY:

<script type="text/javascript">
    var jqXhr = '';

    $('#resultMaker').on('click', function(){

      $("#overlay").show();

      jqXhr = $.ajax({
        url: 'results.php',
        data: $('form').serialize(),
        type: 'post',
        success: function() {
           $("#overlay").hide();
        });
      });
    });

    var abortRequest = function(){
      if (jqXhr != '') {
        jqXhr.abort();
      }
    };

    $("#overlay button").on('click', abortRequest);
    window.addEventListener('unload', abortRequest);
</script>

This is example code - i just have used your code examples and changed something here and there.

Not sure why you mix vanilla JS and jQuery. Also you only post the data to the url, but you don't redirect after submitting. — Bram Vanroy, Apr 20 '16 at 09:15
@BramVanroy, **First** thing is the vanilla JS part -- jQuery `unload` selector is deprecated after v1.8. So safer way is to use traditional way in that case. **Secondly**, You can do the redirect in the `success` function using `window.location.href=` -- and I thought redirection was not an issue - as I mentioned in my point - redirection means synchronous connection and that cannot be aborted. — Himel Nag Rana, Apr 20 '16 at 09:22
I would add to this answer that you will also need to constantly check what `connection_aborted()` returns. You should add such calls in some check points in your code and when it returns `1` - you need to cancel processing. Also if you run some slow external code you may need to introduce lock file to not start new processing until the current one aborts. — Ivan Yarych, Apr 20 '16 at 18:51
Thanks for the reply. I see your point. I tried it out and it works, but obviously I still can't stop the server. Therefore, please see my edit. — Bram Vanroy, Apr 21 '16 at 12:29
Hi @BramVanroy, read the edit. Well - my comment will be -- without extensive case-by-case work on each calls from client side to server -- it is not that required to stop the server to process. However, you can read through this (http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html) and this (http://reviewsignal.com/blog/2013/08/22/long-running-processes-in-php/). Thanks. :-) — Himel Nag Rana, Apr 22 '16 at 04:25

score 3 · Accepted Answer · edited May 23 '17 at 12:15

Himel Nag Rana demonstrated how to cancel a pending Ajax request. Several factors may interfere and delay subsequent requests, as I have discussed earlier in another post.

TL;DR: 1. it is very inconvenient to try to detect the request was cancelled from within the long-running task itself and 2. as a workaround you should close the session (session_write_close()) as early as possible in your long-running task so as to not block subsequent requests.

connection_aborted() cannot be used. This function is supposed to be called periodically during a long task (typically, inside a loop). Unfortunately there is just one single significant, atomic operation in your case: the query to the data back end.

If you applied the procedures advised by Himel Nag Rana and myself, you should now be able to cancel the Ajax request and immediately allow a new requests to proceed. The only concern that remains is that the previous (cancelled) request may keep running in the background for a while (not blocking the user, just wasting resources on the server).

The problem could be rephrased to "how to abort a specific process from the outside".

As Christian Bonato rightfully advised, here is a possible implementation. For the sake of the demonstration I will rely on Symphony's Process component, but you can devise a simpler custom solution if you prefer.

The basic approach is:

Spawn a new process to run the query, save the PID in session. Wait for it to complete, then return the result to the client
If the client aborts, it signals the server to just kill the process.

<?php // query.php

use Symfony\Component\Process\PhpProcess;

session_start();

if(isset($_SESSION['queryPID'])) {
    // A query is already running for this session
    // As this should never happen, you may want to raise an error instead
    // of just silently  killing the previous query.
    posix_kill($_SESSION['queryPID'], SIGKILL);
    unset($_SESSION['queryPID']);
}

$queryString = parseRequest($_POST);

$process = new PhpProcess(sprintf(
    '<?php $result = runQuery(%s); echo fetchResult($result);',
    $queryString
));
$process->start();

$_SESSION['queryPID'] = $process->getPid();
session_write_close();
$process->wait();

$result = $process->getOutput();
echo formatResponse($result);

?>

<?php // abort.php

session_start();

if(isset($_SESSION['queryPID'])) {

    $pid = $_SESSION['queryPID'];
    posix_kill($pid, SIGKILL);
    unset($pid);
    echo "Query $pid has been aborted";

} else {

    // there is nothing to abort, send a HTTP error code
    header($_SERVER['SERVER_PROTOCOL'] . ' 599 No pending query', true, 599);

}

?>

// javascript
function abortRequest(pendingXHRRequest) {
    pendingXHRRequest.abort();
    $.ajax({
        url: 'abort.php',
        success: function() { alert("terminated"); });
      });
}

Spawning a process and keeping track of it is genuinely tricky, this is why I advised using existing modules. Integrating just one Symfony component should be relatively easy via Composer: first install Composer, then the Process component (composer require symfony/process).

A manual implementation could look like this (beware, this is untested, incomplete and possibly unstable, but I trust you will get the idea):

<?php // query.php

    session_start();

    $queryString = parseRequest($_POST); // $queryString should be escaped via escapeshellarg()

    $processHandler = popen("/path/to/php-cli/php asyncQuery.php $queryString", 'r');

    // fetch the first line of output, PID expected
    $pid = fgets($processHandler);
    $_SESSION['queryPID'] = $pid;
    session_write_close();

    // fetch the rest of the output
    while($line = fgets($processHandler)) {
        echo $line; // or save this line for further processing, e.g. through json_encode()
    }
    fclose($processHandler);

?>

<?php // asyncQuery.php

    // echo the current PID
    echo getmypid() . PHP_EOL;

    // then execute the query and echo the result
    $result = runQuery($argv[1]);
    echo fetchResult($result);

?>

I am a bit confused by the first block of code, specifically by the `PhpProcess` (as said, I am not a PHP developer). Am I right in assuming I don't need that (as it's from Symfony), only the PID? Also, it seems that the magic is in `posix_kill`, but how can I make use of the process ID if I'm not using Symfony? I read about `getmypid`, but that doesn't seem [reliable](http://php.net/manual/en/function.getmypid.php). — Bram Vanroy, Apr 24 '16 at 16:09
I accepted your answer as the bounty is about to expire and your answer seems most hopeful. However, I would like a follow-up on the PhpProcess and process ID question above. Thank you. — Bram Vanroy, Apr 25 '16 at 18:59
@BramVanroy Sorry I've been busy lately. See my edit, I hope it helps. If you were worried about the PID's not being unique, it usually is a sequential integer up to 32767. If your process lasts long enough for the system to spawn more than this number of processes, then it is time to move to a serious solution (search for "message queuing"). — RandomSeed, Apr 25 '16 at 22:26

Christian Grün · Answer 3 · 2016-04-21T13:22:09.167

2

With BaseX 8.4, a new RESTXQ annotation %rest:single was introduced, which allows you to cancel a running server-side request: http://docs.basex.org/wiki/RESTXQ#Query_Execution. It should solve at least some of the challenges you described.

The current way to only return chunks of the result is to pass on the index to the first and last result in your result, and to do the filtering in XQuery:

$results[position() = $start to $end]

By returning one more result than requested, the client will know that there will be more results. This may be helpful, because computing the total result size is often much more expensive than returning only the first results.

edited Apr 21 '16 at 13:22

answered Apr 21 '16 at 13:16

Christian Grün

6,012
18
34

Hi Christian, thank you for your quick reply! Unfortunately we're stuck with 7.9. I have added that bit of information to my initial post. – Bram Vanroy Apr 21 '16 at 13:20
Sorry, I see. This requires some more thoughts indeed. – I have just extended my initial reply to give some more feedback on chunking the result. – Christian Grün Apr 21 '16 at 13:23
As a reply to your addendum: I have your answer to [this question](http://stackoverflow.com/a/8900472/1150683). So If I would use `position() = 11 to 20` this will only show the ten next results? So in theory I could increment the position through PHP I assume. But does this result in a speed gain? If I click the button (and ask for the next X) results, and position is from 20 - 30, will XQuery also search/encounter hts 1- 19? Or is Basex smart enough to know where it left off? – Bram Vanroy Apr 21 '16 at 13:26
Exactly. It will be much, much faster. If your query allows results to be iteratively processed. This is e.g. the case for (//*[text() = 'bla'])[position() = 1 to 10], but it’s not possible if you have "blocking operators" like, for example, order by. In the latter case, all results need to be checked in order to determine which one needs to be output. – If you request hits 20-30, your query will also retrieve results 1-19, but this is usually hardly noticeable, compared to the retrieval of the total result set. – Christian Grün Apr 22 '16 at 08:17
So if I understand correctly it goes like this. Page load, show 20 first results. User clicks button (position is set to 21-40). Basex finds 0-40 results, but only returns 21-40. User clicks button again (position set to 41-60). Basex finds 0-60 but only returns 41-60. And so on. So each time, basex has to do a new query and the more times a user clicks the button, the longer it will take (because eventually they'll be looking for, say, 1021-1040, which means looking for 0-1040). **But** on the plus side, this is more user-friendly and the user probably won't go through ALL results anyway – Bram Vanroy Apr 22 '16 at 09:42
(comment was too long), and I assume that - because the button clicks aren't far apart from each other in time - there should be some caching benefits as well? Or is cache emptied on session close? – Bram Vanroy Apr 22 '16 at 09:43
The results won’t be cached by default (because your data can always change in the background while you are requesting the next 20 hits). If it turns out that caching would save you time, you could do this via a proxy (e.g. nginx). – Christian Grün Apr 22 '16 at 11:42
A last comment: I forgot to mention there will be some sort of caching indeed, because with every query that is run more than once, results will be retrieved from disk and thus be moved to main memory. However, there is no explicit caching in BaseX due to the reasons mentioned above. – Christian Grün Apr 23 '16 at 12:35
Hi Christian. Thank you for the comments. I am going to try some things out and will get back to you (here or on the mailing list). Thanks again! – Bram Vanroy Apr 23 '16 at 13:01

score 1 · Answer 4 · edited May 23 '17 at 11:52

I hope I understood this correctly.

Instead of letting the browser "natively" submit the FORM, don't: write JS code that does this instead. In other words (I didn't test this; so interpret as pseudo-code):

<form action="results.php" onsubmit="return false;">
  <input name="my-input" type="text">
  <input type="submit" value="submit">
</form>

So, now, when the that "submit" button is clicked, nothing will happen.

Obviously, you want your form POSTed, so write JS to attach a click handler on that submit button, collect values from all input fields in the form (actually, it is NOT nearly as scary as it sounds; check out the link below), and send it to the server, while saving the reference to the request (check the 2nd link below), so that you can abort it (and maybe signal the server to quit also) when the cancel-button is clicked (alternatively, you can simply abandon it, by not caring about the results).

Submit a form using jQuery

Abort Ajax requests using jQuery

Alternatively, to make that HTML markup "clearer" relative to its functionality, consider not using FORM tag at all: otherwise, what I suggested makes its usage confusing (why it is there if it's not used; know I mean?). But, don't get distracted with this suggestion until you make it work the way you want; it's optional and a topic for another day (it might even relate to your changing architecture of the whole site).

HOWEVER, a thing to think about: what to do if the form-post already reached the server and server already started processing it and some "world" changes have already been made? Maybe your get-results routine doesn't change data, so then that's fine. But, this approach probably cannot be used with change-data POSTs with the expectation that "world" won't change if cancel-button is clicked.

I hope that helps :)

score 0 · Answer 5 · answered Apr 21 '16 at 00:04

The user doesn't have to experience this synchronously.

Client posts a request
The server receives the client request and assigns an ID to it
The server "kicks off" the search and responds with a zero-data page and search ID
The client receives the "placeholder" page and starts checking if the results are ready based on the ID (with something like polling or websockets)
Once the search has completed, the server responds with the results next time it's polled (or notifies the client directly when using websockets)

This is fine when performance isn't quite the bottleneck and the nature of processing makes longer wait times acceptable. Think flight search aggregators that routinely run for 30-90 seconds, or report generators that have to be scheduled and run for even longer!

You can make the experience less frustrating if you don't block user interactions, keep them updated of search progress and start showing results as they come in if possible.

Well, that is great. Unfortunately I don't know a thing or two about web sockets, nor how to implement the idea. With a 200 bounty I think it is fair to ask for some example code from the request til the results come in? — Bram Vanroy, Apr 21 '16 at 06:07

score 0 · Answer 6 · answered Apr 24 '16 at 23:56

You must solve this conceptually first before writing any code. Here are some things that come to mind offhand:

What does it mean to free up resources on the server?

What constitutes to a graceful abort that will free up resources?

Is it enough to kill the PHP process waiting for the query result(s)? If so, the route suggested by RandomSeed could be interesting. Just keep in mind that it will only work on a single server. If you have multiple load balanced servers you won't have a way to kill a process on another server (not as easily at least).

Or do you need to cancel the database request from the database itself? In that case the answer suggested by Christian Grün is of more interest.

Or is it that there is no graceful shutdown and you have to force everything to die? If so, this seems awfully hacky.

Not all clients are going to explicitly abort

Some clients are going to close the browser, but their last request won't come through; some clients will lose internet connection and leave the service hanging, etc. You are not guaranteed to get an "abort" request when a client disconnects or has gone away.

You have to decide whether to live with potentially unwanted behavior, or implement an additional active state tracking, e.g. client pinging server for keepalive.

Side notes

30 secs or greater query time is potentially long, is there a better tool for the job; so you won't have to solve this with a hack like this?
you are looking for features of a concurrent system, but you're not using a concurrent system; if you want concurrency use a better tool/environment for it, e.g. Erlang.

Efficient and user-friendly way to present slow-loading results

6 Answers6

UPDATE (upon the comment):