How to trace possible async race conditions in my test code?

Question

I am learning modern JavaScript, and am writing a little API. I plan to host it in MongoDB Stitch, which is a serverless lambda-like environment. I am writing functions in the way that this system requires, and then adding Jest functions to be run locally and in continuous integration.

I am learning Jest as I go, and for the most part, I like it, and my prior experience with Mockery in PHP is making it a fairly painless experience. However, I have an odd situation where my lack of knowledge of JavaScript is stopping my progress. I have a failing test, but it is intermittent, and if I run all of the tests, sometimes it all passes, and sometimes the test that fails changes from one to another. This behaviour, coupled with my using async-await makes me think I am experiencing a race condition.

Here is my SUT:

exports = async function(delay) {
    /**
     * @todo The lambda only has 60 seconds to run, so it should test how
     * long it has been running in the loop, and exit before it gets to,
     * say, 50 seconds.
     */

    let query;
    let ok;
    let count = 0;

    for (let i = 0; i < 5; i++) {
        query = await context.functions.execute('getNextQuery');
        if (query) {
            ok = context.functions.execute('runQuery', query.phrase, query.user_id);
            if (ok) {
                await context.functions.execute('markQueryAsRun', query._id);
                count++;
            }
        } else {
            break;
        }

        // Be nice to the API
        await sleep(delay);
    }

    return count;

    function sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
};

The content.functions object is a global in Stitch, but as I show below, it is mocked inside Jest. Stitch is not involved in these tests at all.

As you can see, getNextQuery and markQueryAsRun are awaited, as they are defined as async.

Here is my main test:

// Main SUT
const findAndRunQueries = require('./_source');

// Utility test classes
const MongoTester = require('../../test/mongo-tester');
const StitchFuncMocking = require('../../test/stitch-func-mocking');

// Other functions for the integration test
const getNextQuery = require('../getNextQuery/_source');
const markQueryAsRun = require('../markQueryAsRun/_source');

describe('Some integration tests for findAndRunQueries', () => {
    const mongoTester = new MongoTester('findAndRunQueries-integration');
    const stitchFuncMocking = new StitchFuncMocking();

    beforeAll(async () => {
        await mongoTester.connect();
        console.log('Connect');
    });

    afterAll(async () => {
        await mongoTester.disconnect();
        console.log('Disconnect');
    });

    beforeEach(async () => {
        // Set up global values
        global.context = {};
        global.context.services = mongoTester.getStitchContext();
        global.context.functions = stitchFuncMocking.getFunctionsObject(jest);

        // Delete existing mocks
        jest.clearAllMocks();
        stitchFuncMocking.clearMocks();

        // Connect some real implementations
        stitchFuncMocking.setGlobalMock('getNextQuery', getNextQuery);
        stitchFuncMocking.setGlobalMock('markQueryAsRun', markQueryAsRun);

        // Truncate all collections in use
        await mongoTester.emptyCollections(['queries']);
        console.log('Init mocks and clear collections');
    });

    test('end-to-end test with no queries', async () => {
        expect(await findAndRunQueries(0)).toBe(0);
    });

    test('end-to-end test with one successful query', async () => {

        // Here is a query entry
        await mongoTester.getDatabase().collection('queries').insertOne({
            "user_id": 1,
            "phrase": 'hello',
            "last_run_at": null,
            "enabled": true
        });

        var d = await mongoTester.getDatabase().collection('queries').findOne({});
        console.log(d);

        // We need to mock runQuery, as that calls an external API
        stitchFuncMocking.setGlobalMock('runQuery', () => 123);

        // Let's see if we can run a call sucessfully
        expect(await findAndRunQueries(0)).toBe(1);

        // @todo Check that a log entry has been made
    });

});

From this code you can see that getNextQuery and markQueryAsRun are wired to their real implementations (since this is an integration test) but runQuery is a mock, because I don't want this test to make HTTP calls.

For brevity, I am not showing the above code, as I don't think it is needed to answer the question. I am also not showing all of MongoTester or any of StitchFuncMocking (these connect to an in-memory MongoDB instance and simplify Jest mocking respectively).

For database-level tests, I run this MongoTester utility function to clear down collections:

    this.emptyCollections = async function(collections) {
        // Interesting note - how can I do deleteMany without async, but
        // wait for all promises to finish before the end of emptyCollections?
        collections.forEach(async (collectionName) => {
            let collection = this.getDatabase().collection(collectionName);
            await collection.deleteMany({});
        });
    };

This is how I am running the test:

sh bin/test-compile.sh && node node_modules/jest/bin/jest.js -w 1 functions/findAndRunQueries/

The compilation step can be ignored (it just converts the exports to module.exports, see more here). I then run this test plus a unit test inside the functions/findAndRunQueries/ folder. The -w 1 is to run a single thread, in case Jest does some weird parallelisation.

Here is a good run (containing some noisy console logging):

root@074f74105081:~# sh bin/test-compile.sh && node node_modules/jest/bin/jest.js -w 1 functions/findAndRunQueries/
 PASS  functions/findAndRunQueries/findAndRunQueries.integration.test.js
  ● Console

    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:18
      Connect
    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:42
      Init mocks and clear collections
    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:42
      Init mocks and clear collections
    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:60
      { _id: 5e232c13dd95330804a07355,
        user_id: 1,
        phrase: 'hello',
        last_run_at: null,
        enabled: true }

 PASS  functions/findAndRunQueries/findAndRunQueries.test.js

Test Suites: 2 passed, 2 total
Tests:       6 passed, 6 total
Snapshots:   0 total
Time:        0.783s, estimated 1s
Ran all test suites matching /functions\/findAndRunQueries\//i.

In the "end-to-end test with one successful query" test, it inserts a document and then passes some assertions. However, here is another run:

root@074f74105081:~# sh bin/test-compile.sh && node node_modules/jest/bin/jest.js -w 1 functions/findAndRunQueries/
 FAIL  functions/findAndRunQueries/findAndRunQueries.integration.test.js
  ● Console

    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:18
      Connect
    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:42
      Init mocks and clear collections
    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:42
      Init mocks and clear collections
    console.log functions/findAndRunQueries/findAndRunQueries.integration.test.js:60
      null

  ● Some integration tests for findAndRunQueries › end-to-end test with one successful query

    expect(received).toBe(expected) // Object.is equality

    Expected: 1
    Received: 0

      64 | 
      65 |         // Let's see if we can run a call sucessfully
    > 66 |         expect(await findAndRunQueries(0)).toBe(1);
         |                                            ^
      67 | 
      68 |         // @todo Check that a log entry has been made
      69 |     });

      at Object.test (functions/findAndRunQueries/findAndRunQueries.integration.test.js:66:44)

 PASS  functions/findAndRunQueries/findAndRunQueries.test.js

Test Suites: 1 failed, 1 passed, 2 total
Tests:       1 failed, 5 passed, 6 total
Snapshots:   0 total
Time:        0.918s, estimated 1s
Ran all test suites matching /functions\/findAndRunQueries\//i.

The null in the log output indicates that the insert failed, but I do not see how that is possible. Here is the relevant code (reproduced from above):

        await mongoTester.getDatabase().collection('queries').insertOne({
            "user_id": 1,
            "phrase": 'hello',
            "last_run_at": null,
            "enabled": true
        });

        var d = await mongoTester.getDatabase().collection('queries').findOne({});
        console.log(d);

I assume the findOne() returns a Promise, so I have awaited it, and it is still null. I also awaited the insertOne() as I reckon that probably returns a Promise too.

I wonder if my in-RAM MongoDB database might not be performing like a real Mongo instance, and I wonder if I should spin up a Docker MongoDB instance for testing.

However, perhaps there is just an async thing I have not understood? What can I dig into next? Is it possible that the MongoDB write concerns are set to the "don't confirm write before returning control"?

Also perhaps worth noting is that the "Disconnect" message does not seem to pop up. Am I not disconnecting from my test MongoDB instance, and could that cause a problem by leaving an in-memory server in a broken state?

In order to keep the question self-contained, I have tried to put all relevant code in the post. However, if anyone wants a fuller expression of the problem, [the repo is here](https://github.com/halfer/stackwatcher). — halfer, Jan 18 '20 at 16:31
unfortunately, has no experience with Stitch and serverless. just wild guess: may you `console.log(ok)` to ensure it's not about `if (ok)` somehow? — skyboyer, Jan 18 '20 at 16:54
@skyboyer: thanks for your reply. Experience with Stitch and Serverless is not needed at all for this question. Neither are used when the tests are running. — halfer, Jan 18 '20 at 17:46
I will undertake to add extra logging, it's not a bad idea. I will see if I can do that tomorrow. — halfer, Jan 18 '20 at 17:47
My next thought is that I should try to rewrite the tests so that the `beforeAll`, `afterAll`, `beforeEach`, `afterEach`, and the tests, are not declared async. It occurs to me that this forces them to return a Promise, which Jest probably does not `await` for (and so they run in an unpredictable order). Instead I think they should be ordinary functions returning nothing, and any ordering of Promise calls in each function should be done with `.then()` instead. I will update here when I get a chance to try that. Answers/comments welcome in the meantime! — halfer, Jan 18 '20 at 17:49
jest await for any part(except `describe()` function) that returns `Promise`(check https://jestjs.io/docs/en/setup-teardown#repeating-setup-for-many-tests), so it definitely looks valid approach to me(and I used this for many times without any race conditions experience) — skyboyer, Jan 18 '20 at 18:50
Alright, thanks @skyboyer. Maybe I will try switching to a real MongoDB first, in that case. Perhaps there is something awry with the in-memory version. — halfer, Jan 19 '20 at 10:47
@skyboyer: a colleague pointed me to my error. I have updated the question with more useful code - can you spot it? `:-)`. I doubt I would have got this one in a month of Sundays! — halfer, Jan 21 '20 at 11:51
Would this question be of help to you? especially the self proposed solution with the redis lock? https://stackoverflow.com/questions/55629269/how-to-avoid-two-concurrent-api-requests-breaking-the-logic-behind-document-vali — Tschallacka, Jan 21 '20 at 11:58
@Tschallacka: that's most interesting, and thank you for it, but the solution is simpler than that. I will add an answer in due course. — halfer, Jan 21 '20 at 14:23
_(Note to readers: this post has attracted some serial downvotes, I suspect one of which will be removed by the reversal script. I suspect my editing has made someone grumpy)_ — halfer, Mar 14 '20 at 00:35

score 0 · Accepted Answer · answered Jan 21 '20 at 19:06

The problem in my project was not initially shown in the question - I had omitted to show the emptyCollections method, believing the implementation to be trivial and not worth showing. I have now updated that in the question.

The purpose of that method is to clear down collections between tests, so that I do not accidentally rely on serial effects based on test run order. The new version looks like this:

    this.emptyCollections = async function(collections) {
        const promises = collections.map(async (collectionName, idx) => {
            await this.getDatabase().collection(collectionName).deleteMany({})
        });

        await Promise.all(promises);
    };

So, what was going wrong with the old method? It was just a forEach wrapping awaited database operation Promises - not much to go wrong, right?

Well, it turns out that plenty can go wrong. I have learned that async functions can be run in a for, and they can be run in a for in, but the Array implementation of forEach does not do any internal awaiting, and so it fails. This article explains some of the differences. It sounds like if there is one takeaway, it is "don't try to run async code in a forEach loop".

The effect of my old function was that the collection emptying was not finished even after that method had run, and so when I did other async operations - like inserting a document! - there would be a race that determined which one executed first. This partially explains the intermittent test failure.

As it turns out, I had made some provision while debugging to use a different Mongo database per test file, in case Jest was doing some parallelisation. In fact it was, and when I removed the feature to use different databases, it failed again - this time because there was a race condition between tests (and their respective beforeEach functions) to set up the state of a single database.

How to trace possible async race conditions in my test code?

1 Answers1