0

Original problem

I am creating an API using express that queries a sqlite DB and outputs the result as a PDF using html-pdf module.

The problem is that certain queries might take a long time to process and thus would like to de-couple the actual query call from the node server where express is running, otherwise the API might slow down if several clients are running heavy queries.

My idea to solve this was to decouple the execution of the sqlite query and instead run that on a python script. This script can then be called from the API and thus avoid using node to query the DB.

Current problem

After quickly creating a python script that runs a sqlite query, and calling that from my API using child_process.spawn(), I found out that express seems to get an exit code signal as soon as the python script starts to execute the query.

To confirm this, I created a simple python script that just sleeps in between printing two messages and the problem was isolated.

To reproduce this behavior you can create a python script like this:

print("test 1")
sleep(1)
print("test 2)

Then call it from express like this:

router.get('/async', function(req, res, next) {
    var python = child_process.spawn([
        'python3'
    );
    var output = "";
    python.stdout.on('data', function(data){ 
        output += data
        console.log(output)
    });

    python.on('close', function(code){ 
        if (code !== 0) {  
          return res.status(200).send(code) 
        }
        return res.status(200).send(output)
    });
});

If you then run the express server and do a GET /async you will get a "1" as the exit code.

However if you comment the sleep(1) line, the server successfully returns

test 1
test 2

as the response.

You can even trigger this using sleep(0).

I have tried flushing the stdout before the sleep, I have also tried piping the result instead of using .on('close') and I have also tried using -u option when calling python (to use unbuffered streams).

None of this has worked, so I'm guessing there's some mechanism baked into express that closes the request as soon as the spawned process sleeps OR finishes (instead of only when finishing).

I also found this answer related to using child_process.fork() but I'm not sure if this would have a different behavior or not and this one is very similar to my issue but has no answer.

Main question

So my question is, why does the python script send an exit signal when doing a sleep() (or in the case of my query script when running cursor.execute(query))?

If my supposition is correct that express closes the request when a spawned process sleeps, is this avoidable?

One potential solution I found suggested the use of ZeroRPC, but I don't see how that would make express keep the connection open.

The only other option I can think of is using something like Kue so that my express API will only need to respond with some sort of job ID, and then Kue will actually spawn the python script and wait for its response, so that I can query the result via some other API endpoint.

Is there something I'm missing?

Edit:

AllTheTime's comment is correct regarding the sleep issue. After I added from time import sleep it worked. However my sqlite script is not working yet.

Acapulco
  • 3,373
  • 8
  • 38
  • 51
  • 1
    It works just fine with sleep... but you need to import it properly at the top of your python script `from time import sleep` if you run your script or check the stderr you would see this `NameError: name 'sleep' is not defined`, also the exit code `1` means there is an error. It's possible there is also an error in your sqlite python script that you aren't seeing because you aren't checking stderr. You will be able to tell very quickly if the `code` sent in the `close` event is `1`. Make sure your python script actually works first before blaming node! – Christopher Reid Jan 25 '17 at 22:10
  • Aha, nice catch!. It seems you are right. However it does not work with the sqlite python script even though there are no errors in it. If I run it directly in the console it does work as expected. But as soon as I call it from inside node it fails. Btw, I'm definitely not blaming node, more like blaming myself for missing something :) Thanks for the comment though. – Acapulco Jan 25 '17 at 23:08

1 Answers1

0

As it turns out AllTheTime was indeed correct.

The problem was that in my python script I was loading a config.json file, which was loaded correctly when called from the console because the path was relative to the script.

However when calling it from node, the relative path was no longer correct.

After fixing the path it worked exactly as expected.

Acapulco
  • 3,373
  • 8
  • 38
  • 51