-1

I have a Lambda that gets thousands of events sent to it at one time. Concurrency is left at the default, which means AWS will spin up several instances to handle the incoming events. The Lambda takes the data and inserts some data into a database if that data doesn't already exist. The Lambda is written in Node.js and uses Knex to connect to a Postgres database.

The Lambda essentially contains this logic:

Does a record with ID X exist?     
a. Yes: do nothing
b. No: create a new record with ID X.

The problem is that when 50 Lambdas spin up at the same time, they'll enter a race condition where, say, 3 or 4 of them will check for the existing record at the same time (or within microseconds of each other) and not find it, therefore inserting multiple, duplicate records.

I know one way to solve this would be to create a unique constraint on the table to prevent multiple records with ID X. Then my logic would look like this:

Does a record with ID X exist? 
a. Yes: do nothing 
b. No: create a new record with ID X.
   b.1. Did that succeed?
      a. Yes: continue on.
      b. No, it threw a unique constraint error: go back to line 1.

This seems a bit contrived, but should work. Is there a better option?

EDIT:

Here is the actual code:

let location = await Location.query().where({ external_id }).first();
if(!location){
    location = await Location.query().insert({
        name,
        external_id
    });
}
user2719094
  • 1,611
  • 5
  • 26
  • 36
  • I think this question is not necessarily specific to aws-lambda or postgresql...it's generalized to [consistency models](https://en.wikipedia.org/wiki/Consistency_model) and which route to take. [Jespen](https://jepsen.io/consistency) has some diagrams to help explain the strengths of each model – RichS Apr 19 '19 at 05:53
  • And perhaps a good first read would be with: [Correctness](https://aphyr.com/posts/313-strong-consistency-models). This link has more drawings to help explain. – RichS Apr 19 '19 at 05:58
  • @JaromandaX I have added the *actual* code. It literally does what I explained, but I hope you're satisfied. – user2719094 Apr 19 '19 at 06:00
  • oh ... I was thinking the code for `insert` - but that's probably not your code anyway - sorry – Jaromanda X Apr 19 '19 at 06:04
  • 1
    Some databases let you perform an "upsert" operation which means: "**up**date if found, otherwise in**sert**" – RichS Apr 19 '19 at 06:06
  • if by the `insert` code you mean the code that actually constructs the SQL statement, then yeah, I'm using Knex (or Objection.js on top of Knex) – user2719094 Apr 19 '19 at 06:07
  • 1
    @RichS that's an interesting idea. I think Postgres does support that, but not sure Knex does. I will investigate. Thanks for the idea. – user2719094 Apr 19 '19 at 06:09
  • I think postgres "upsert" is not built in (yet?*) but I think it's achieved with something like `INSERT ... ON CONFLICT UPDATE` (*some folks asked postgres for it...) – RichS Apr 19 '19 at 06:15
  • yeah, that's what it seems like, so it's still just dependent on having a unique constraint and then providing an update statement as the failover. – user2719094 Apr 19 '19 at 06:18

1 Answers1

2

Code like this:

Does a record with ID X exist?      
a. Yes: do nothing 
b. No: create a new record with ID X.

without locking the database somehow is a race condition. Between querying for record X and creating it, some other request can create it too. Don't do it this way, ever. This is racy.

You have to look at the specific tools your database offers, but a common way to execute the above sequence is to set up the database so that it doesn't allow duplicates for ID X and then you just attempt to create the record with ID x. Then, it will atomically either get created or return an error and there will be no opportunity for a race condition. You just look for the error and handle it

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • 1
    Linking: [How do I ALTER a PostgreSQL table and make a column unique?](https://stackoverflow.com/questions/469471/how-do-i-alter-a-postgresql-table-and-make-a-column-unique) – RichS Apr 19 '19 at 06:02
  • 1
    Thank you. So your recommendation is to do as I suggested and create a unique constraint, letting an insert fail, and then falling back to trying to select the record a second time, when it should now exist? – user2719094 Apr 19 '19 at 06:04
  • 1
    @user2719094 - Generally yes. But, I don't know what you mean by select it the second time. If you're trying to create or update the record, then many databases have an atomic operation for that specific operation (sometimes called upsert) which will do an insert or update. – jfriend00 Apr 19 '19 at 06:09
  • @jfriend00 yeah, i realize that probably wasn't clear. I really need to get the row ID for the inserted record so I can use it somewhere else, so if the insert fails, I still need to retry the select so I can get the ID I need. – user2719094 Apr 19 '19 at 06:10
  • Long version: [Postgres UPSERT](https://wiki.postgresql.org/wiki/UPSERT#.22UPSERT.22_definition). Short-version: INSERT ... ON CONFLICT UPDATE – RichS Apr 19 '19 at 06:17