I'm implementing an importer that's using a pool of workers (powered by Celery and RabbitMQ). The data that's being imported includes user data, identified by a unique user ID, and is stored to a PostgreSQL database.
Usually I receive a lot of import files, and those files are processed concurrently. In multiple files, there can be data for the same user ID. This results in the situation where 2 imports try to process data for the same user ID concurrently.
If data for the specific user ID already exists, there's no issue: The database row is locked by utilizing SELECT ... FOR UPDATE
. However if there's no data associated with the ID, both imports can run into the troublesome situation where both try to INSERT
— using the same user ID.
As the user ID field has a UNIQUE
constraint, the transactions fail.
How can I prevent this from happening? Are there other options than locking the full table?