5

Problem:

I'm working on a project that consists of multiple studies and a group of users that each participates in one of the studies. Every study divides the participants in two groups based on a list that is generated using some randomization algorithm. Upon signing up each user is assigned to a study and their group is determined by the order of signup and the corresponding index in the groups list. For example if study A has total seats of 4 and the groups list is [0, 1, 1, 0] the first user is assigned to group 0, the second one to 1 and so on until the study is full.

There are other user roles defined in the project which are the admins and can be assigned to multiple studies without occupying a position in the study. That means the relation of users to studies is n:m.

The problem that occurs in the current implementation is a race condition when assigning users to studies and study groups. The code is provided below and the way it works is that it overrides the addUser of Study model and whenever a user is added to a study it checks how many users are already in the study and gives the user the current index of the group list which is the seatsTaken number. This works so long as the users are added to the study in intervals. But whenever multiple users are added at the same time the asynchronous queries cause a race condition and the seatsTaken count is affected by other users signing up at the same time.

In the example below the users which are assigned to study A in intervals have the correct groups assigned but study B with simultaneous queries has incorrect groups assignment.

const Sequelize = require('sequelize');
const assert = require('assert');

const sequelize = new Sequelize({
  database: 'database',
  username: 'username',
  password: 'password',
  dialect: process.env.DB_DIALECT || 'sqlite',
  storage: 'db.sqlite',
  logging: false
});

const User = sequelize.define('user', {
  id: {
    type: Sequelize.INTEGER,
    autoIncrement: true,
    primaryKey: true,
  },
  group: {
    type: Sequelize.INTEGER,
    allowNull: true,
    defaultValue: null
  }
});

// Groups list for studies 'A' and 'B'
const groupLists = {
  a: [0, 1, 1, 0],
  b: [1, 0, 1, 0]
}

const Study = sequelize.define('study', {
  id: {
    type: Sequelize.INTEGER,
    autoIncrement: true,
    primaryKey: true,
  },
  name: {
    type: Sequelize.STRING,
    allowNull: false
  },
  seatsTotal: {
    type: Sequelize.INTEGER,
    defaultValue: 0
  }
});

// n:m relation between users and studies
User.belongsToMany(Study, {through: 'UserStudy'});
Study.belongsToMany(User, {through: 'UserStudy'});

// Overridden 'addUser' method for groups assignment
Study.prototype.addUser = async function(user) {
  // Count already occupied seats
  const seatsTaken = await User.count({
    include: [{
      model: Study,
      where: {
        name: this.name
      }
    }]
  });
  // Add the user to study
  await Study.associations.users.add(this, user);
  // Assign the group of the user based on the seatsTaken
  await user.update({ group: groupLists[this.name][seatsTaken] });
}

sequelize.sync({force: true}).then(async () => {
  // Studies 'A' and 'B' with 4 seats
  await Study.bulkCreate([{name: 'a', seatsTotal: 4}, {name: 'b', seatsTotal: 4}]);
  // 8 users
  await User.bulkCreate(new Array(8).fill(0).map(() => ({})));

  const studies = await Study.findAll();
  const users = await User.findAll(); 

  // Assign half of the users to study 'A' in intervals
  users.filter((_, idx) => idx % 2 === 0).forEach((user, idx) => {
    setTimeout(() => {
      studies[0].addUser(user);
    }, 100*idx);
  });

  // Assign the other half to study 'B' at the same time
  await Promise.all(users.filter((_, idx) => idx % 2 === 1).map(user => {
    return studies[1].addUser(user);
  }));

  setTimeout(async () => {
    // Wait for all queries to finish and assert the results
    const userStudies = await User.findAll({
      include: [Study]
    });

    const studyUsersA = userStudies.filter(u => u.studies.some(s => s.name === 'a'));
    const studyUsersB = userStudies.filter(u => u.studies.some(s => s.name === 'b'));

    try {
      console.log('Group list A actual:', studyUsersA.map(u => u.group), 'expected:', groupLists['a']);
      assert.deepEqual(studyUsersA.map(u => u.group).sort((a, b) => a-b), groupLists['a'].sort((a, b) => a-b), 'Group list A is not assigned correctly');
      console.log('Group list B actual:', studyUsersB.map(u => u.group), 'expected:', groupLists['b']);
      assert.deepEqual(studyUsersB.map(u => u.group).sort((a, b) => a-b), groupLists['b'].sort((a, b) => a-b), 'Group list B is not assigned correctly');
      console.log(`Passed: Group lists are assigned correctly.`);
    } catch (e) {
      console.log(`Failed: ${e.message}`);
    }
  }, 500);
});

Related questions that I could find are either about incrementing one value in one table or they just mention transactions and locks without providing example code:
Avoiding race condition with Nodejs Sequelize
How to lock table in sequelize, wait until another request to be complete
Addition and Subtraction Assignment Operator With Sequelize
Database race conditions

Limitations:

  • The project stack is nodejs, expressjs and sequelize with mysql database for production and sqlite for development and tests.
  • The solution should work for both sqlite and mysql.
  • It is preferred that the groups lists are not stored in the database. The lists are generated by an algorithm and randomization seeds but in the example code they're hardcoded.
  • The solution should be a sequelize solution and not throttling or queuing user requests in express server.
  • In the case of simultaneous requests it is not strictly required that the exact order of users signup is preserved since it's not really verifiable which user is added to the study first, but the final result must have the correct number of 0s and 1s which are the assigned groups.
  • I've tried sequelize transactions but I had lots of issues with sqlite compatibilities and also express requests failing because of database locks but that might have been because of my lack of knowledge on how to do it properly. The limitation here is that requests should not fail because of database locks.

The provided code is a minimal example reproducing the issue. Please use it as a base.

To run the code

npm install sequelize sqlite3 mysql2

sqlite:

node index.js

mysql (using docker):

docker run -d --env MYSQL_DATABASE=database --env MYSQL_USER=username --env MYSQL_PASSWORD=password --env MYSQL_RANDOM_ROOT_PASSWORD=yes -p 3306:3306 mysql:5.7
DB_DIALECT=mysql node index.js

Note:

  • The example code is only for demonstration of the issue in the current implementation and the intervals and timeouts are there to simulate the users interaction with the server. Please do not focus on the patterns in the example being wrong and rather focus on the problem itself and how it can be approached in a better way while meeting the requirements mentioned in the limitations section.
  • This is part of a fairly big project and I might update the requirements based on the actual project requirements and the feedbacks that I receive here.

Please let me know if I should provide any other information. Thank you in advance.

jal
  • 1,100
  • 1
  • 8
  • 17

1 Answers1

2

I'm afraid this is expected behavior.

  • You declare seatsTaken as an asynchronously computed property.
  • You insert several users asynchronously, too.
  • You don't isolate each user creation within its own transaction.

Because of this, you see the changing state of one transaction, and it's changing rather chaotically because you do not specify any certain order. Eventually the state becomes consistent, but your way to achieve that state was just to wait for some time.

I suppose the easiest way to achieve consistency would be wrapping each insertion in a transaction.

If a transaction per insert is too slow, you can bulk insert all user records in one transaction, and then count seats taken in another, or even just do everything synchronously.

In any case, if you want consistency, you need logical serialization, a clear "before-after" relation. Currently your code lacks it, AFAICT.

9000
  • 39,899
  • 9
  • 66
  • 104
  • Surely the example is far from a working solution and it's there only to demonstrate the problem. The problem itself is a real world pragmatic issue. So the question is how would someone approach this problem to build a working solution. Can you improve the example with your suggestions? – jal Jul 09 '19 at 11:12