How do I run code that accesses an Apache OpenWhisk action's in-memory state after the action completes?

Question

I was testing out using IBM Cloud Functions (managed Apache OpenWhisk) to run code in the background after the action finishes, but my callback provided when calling setTimeout didn't run at the right time later, and it never ran at all unless I invoked the function a second time. It ran at that point (late).

Details:

I had two use cases in mind:

Accumulating multiple requests' data in memory and then putting a large object to a Cloud Object Storage bucket once a lot has accumulated or a certain amount of time has passed with no requests.
Managing a database connection, per container, for APIs, so that I can close the connection in an unused container that IBM Cloud Functions hasn't killed yet.

I assumed this would work because I've used other platforms like Google Cloud Run that I noticed running code in the background (using setTimeout etc), seeing the logs for this code in Stackdriver after the request finished. And, there's even an entire library created by an AWS developer advocate that manages MySQL connections in the background on AWS Lambda (https://www.npmjs.com/package/serverless-mysql).

I tested with the following function:

// from https://stackoverflow.com/questions/105034/how-to-create-guid-uuid
function uuidv4() {
    return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function (c) {
        var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
        return v.toString(16);
    });
}

function main() {
    const runId = uuidv4().slice(31, 36);

    console.log(`function started (runId = ${runId})`);

    setTimeout(() => {
        console.log(`after 5s delay (runId = ${runId})`);
    }, 5000);

    return {
        msg: `ok (runId = ${runId})`,
    };
}

And I deployed it with the command ibmcloud fn action update logging-in-background src/index.js --kind nodejs:10.

I created a LogDNA instance and set it as my platform instance so my function logs would go to it. This is what I see in the logs after invoking the function three times with the command ibmcloud fn action invoke logging-in-background --blocking, each 10 seconds apart (CRN redacted):

May 18 17:26:23 functions REDACTED 2020-05-18T21:26:23.956013Z    stdout: function started (runId = 9be7c)
May 18 17:26:23 functions REDACTED Activation record '3589870e8ce44cc089870e8ce4acc018' for entity 'logging-in-background'
May 18 17:26:34 functions REDACTED 2020-05-18T21:26:34.111745Z    stdout: after 5s delay (runId = 9be7c)
May 18 17:26:34 functions REDACTED 2020-05-18T21:26:34.115043Z    stdout: function started (runId = faba6)
May 18 17:26:34 functions REDACTED Activation record 'ac47c067177648f187c0671776b8f1c2' for entity 'logging-in-background'
May 18 17:26:44 functions REDACTED 2020-05-18T21:26:44.248470Z    stdout: after 5s delay (runId = faba6)
May 18 17:26:44 functions REDACTED 2020-05-18T21:26:44.253822Z    stdout: function started (runId = 0af34)
May 18 17:26:44 functions REDACTED Activation record 'bbad3eabb3d64ab1ad3eabb3d61ab1a7' for entity 'logging-in-background'

You can see how when I first invoked the function, it only logged the "function started" message. It didn't log the "after 5s delay" message 5 seconds later. But then, at the beginning of the second invocation, 10 seconds after the first invocation, it finally logs the "after 5s delay" message associated with run 9be7c. The callback for setTimeout appears to never run until, at earliest, the next time the action is invoked.

Is this how Apache OpenWhisk is meant to work by design, or is there something I'm not doing correctly to run code in the background after the action is finished?

"Is this how Apache OpenWhisk is meant to work by design..." -- What you observed and describe is how the system is intended to work. — user6062970, May 18 '20 at 23:10

user6062970 · Answer 1 · 2020-05-21T22:48:00.467

0

Your function is returning before the setTimeout has completed.

Take a look at the docs here:

https://github.com/apache/openwhisk/blob/master/docs/actions-nodejs.md#creating-asynchronous-actions

The correct way to do this:

function main(args) {
    const runId = uuidv4().slice(31, 36);

    console.log(`function started (runId = ${runId})`);
    return new Promise(function(resolve, reject) {
       setTimeout(function() {
         console.log(`after 5s delay (runId = ${runId})`);
         resolve({ msg: `ok (runId = ${runId}) });
       }, 5000);
    })
}

With AWS lambda if when you leave tasks in the event loop, the function doesn't return/terminate the request until the even loop is empty. If you use the special context flag to allow the function to return eagerly, you're not guaranteed that the background processing will actually complete.

To initialize connection pools and reuse them across invocations, you can store the connection in a global variable and make sure it's initialized only.

edited May 21 '20 at 22:48

answered May 18 '20 at 22:03

user6062970

831
4
5

I actually already read that part of the docs. But I'm not trying to create an action that doesn't return until an asynchronous part of it finishes. I'm trying to create an action that is able to finish while still leaving work on the event loop to be completed later. One of my use cases, accumulating data to put a large object to a COS bucket after the action completes, would also involve debouncing. The whole point is to finish an action, keep some data in memory, accept more data in future action invocations, and then eventually send the data to COS. Can OpenWhisk do this? – Matt Welke May 18 '20 at 22:13
There are also common techniques to manage persistent database connections from FaaS platforms, so that a "zombie" container doesn't keep a connection open, with multiple zombie containers eventually saturating a database's connection limit. This is the other use case I mentioned in my question. It's important to me that OpenWhisk can do this too, otherwise I won't be able to safely keep a reference to a connection in an action and reuse it for multiple invocations. I'd have to close and reopen a connection each time. – Matt Welke May 18 '20 at 22:15
The way to do this with OpenWhisk is to invoke an asynchronous action which can complete the processing. Either using the npm openwhisk package to do another invoke, or the REST API directly. For more complicated workflows you might want to look at the OpenWhisk composer (like AWS Step Functions). With AWS lambda when you leave tasks in the event loop, the function doesn't return/terminate the request until the even loop is empty. If you use the special context flag to allow the function to return eagerly, you're not guaranteed that the background processing will actually complete. – user6062970 May 18 '20 at 23:07
"The whole point is to finish an action, keep some data in memory, accept more data in future action invocations, and then eventually send the data to COS. Can OpenWhisk do this?" --- You can leave intra-function state around. For example if you declare a global variable, it's available for all invocations of the function that reuse that container. You can use this approach (of having a global reference) for also creating a database connection or a connection pool. You don't have to create new connections on every invocation. – user6062970 May 18 '20 at 23:12
Thank you for the suggestion to look into other patterns, like invoking another action from within my action, but I don't think that will work for my use case either. I'd like the code that runs later to still have access to the first action's state. For my COS use case, I think my only option would be to store that state in an external database and have a second scheduled action poll that database and put the object to the bucket when enough has been collected. – Matt Welke May 18 '20 at 23:28
For the zombie SQL connection issue (a container not being used but not killed yet), I can't think of a solution if OpenWhisk won't run the code placed on the event loop after the action finishes. There would be no way to close the connection except to wait until OpenWhisk decides to kill the container. It makes me wonder how that serverless-mysql library does this. Somehow, it's able to close connections after the AWS Lambda function returns when it detects that it's a good time to do so. I can try looking into how that library works. – Matt Welke May 18 '20 at 23:28
Well, I feel pretty silly now. Now that I look into what the serverless-mysql library does, it's clear that it doesn't add callbacks to the Node.js event loop. It does everything server side, where the server in this case is the MySQL server. At the end of each lambda invocation, it runs a MySQL query that kills connections to other lambda containers if they haven't been used in a while. Now that you understand what I was trying to do, if you edit your answer to explain the event loop issue, I can mark it as the accepted answer. – Matt Welke May 18 '20 at 23:46

score 0 · Accepted Answer · answered Jun 06 '20 at 19:35

Another user helped me understand what was going by explaining that OpenWhisk does not run code in the event loop after the action finishes. Therefore, it is impossible to do what I wanted to do, where I wanted every action invoked to retrieve data stored in a local variable some time after the action finished.

How do I run code that accesses an Apache OpenWhisk action's in-memory state after the action completes?

2 Answers2