4

I have a function on my website that saves a bunch of values quite quickly to the same DataObject type. Most of the time it's OK but occasionally I get an error

ERROR: duplicate key value violates unique constraint ...

Reading through the documentation I see:

SilverStripe does not use the database's built-in auto-numbering system. Instead, it will generate a new ID by adding 1 to the current maximum ID

And previously looking through the code it looks like it retrieves the max number from the primary key, inserts a record with that ID, then sets the values of the DataObject and writes again. In my load balanced environment, when these multiple entries are sent, I believe the insert is happening with the same primary key, hence the error.

As far as I can see this is an issue I can't get around. From other questions and doco I can't set a composite primary key. Only thing I can think of is to run a custom sql for the create which does use the DB's inbuilt auto-numbering system.

Is there a better way to deal with this error or a way I can set a composite primary key?

EDIT

The full error is

Query failed: ERROR: duplicate key value violates unique constraint 'TABLE_pkey'
DETAIL: Key ('ID')=(136) already exists.

And the statement:

INSERT INTO "TABLE" ("ClassName", "Name", "MemberID", "OtherTabeID", "Value", "LastEdited", "Created", "ID") VALUES ($1, $2, $3, $4, $5, $6, $7, $8),Array) 

I read this as it's inserting the ID from a previously determined value rather than relying on the DB auto-increment. Is that correct?

EDIT 2

Looking through logs it looks like the INSERT is done first with Created field, then select statement is done to get the ID:

SELECT last_value FROM "TABLENAME_ID_seq"

then an UPDATE is done with the additional details being saved.

I feel like this could be a race condition that would cause saving to incorrect rows, though not cause what I'm currently experiencing. Ideally any INSERT would have a returning "ID" that would be used for the update command.

EDIT 3

The above process is contrary to the stack trace I have which shows the insert includes more than just Created:

pg_query_params(Resource id #154,INSERT INTO "TABLENAME" ("ClassName", "Name", "MemberID", "OTHERTABLEID", "Value", "LastEdited", "Created", "ID") VALUES ($1, $2, $3, $4, $5, $6, $7, $8),Array) 
PostgreSQLConnector.php:200
PostgreSQLConnector->preparedQuery(INSERT INTO "TABLENAME" ("ClassName", "Name", "MemberID", "OTHERTABLEID", "Value", "LastEdited", "Created", "ID") VALUES (?, ?, ?, ?, ?, ?, ?, ?),Array,256) 
Database.php:143
SS_Database->{closure}(INSERT INTO "TABLENAME" ("ClassName", "Name", "MemberID", "OTHERTABLEID", "Value", "LastEdited", "Created", "ID") VALUES (?, ?, ?, ?, ?, ?, ?, ?)) 
Database.php:193
SS_Database->benchmarkQuery(INSERT INTO "TABLENAME" ("ClassName", "Name", "MemberID", "OTHERTABLEID", "Value", "LastEdited", "Created", "ID") VALUES (?, ?, ?, ?, ?, ?, ?, ?),Closure,Array) 
Database.php:146
SS_Database->preparedQuery(INSERT INTO "TABLENAME" ("ClassName", "Name", "MemberID", "OTHERTABLEID", "Value", "LastEdited", "Created", "ID") VALUES (?, ?, ?, ?, ?, ?, ?, ?),Array,256) 
DB.php:365
DB::prepared_query(INSERT INTO "TABLENAME" ("ClassName", "Name", "MemberID", "OTHERTABLEID", "Value", "LastEdited", "Created", "ID") VALUES (?, ?, ?, ?, ?, ?, ?, ?),Array) 
SQLExpression.php:121
Rudiger
  • 6,749
  • 13
  • 51
  • 102
  • Can you use a sequence with it? – Jorge Campos Oct 06 '17 at 03:37
  • @JorgeCampos Could you do that through the ORM? I can directly write SQL in it but it's avoiding all the good things that come with the ORM so I was hoping to avoid it. – Rudiger Oct 06 '17 at 04:08
  • 1
    Unfortunately I don't know silverstripe to answer that. I suggested a sequence because it is a builtin function to the database. Searching around it MAY be possible using the [manipulate](http://api.silverstripe.org/en/3.0/class-SS_Database.html#_manipulate) method, not sure though. – Jorge Campos Oct 06 '17 at 04:13
  • Can you show the *full, complete* error message please? In particular, showing the table name? – Craig Ringer Oct 09 '17 at 01:11
  • @CraigRinger question updated with important bits, the error is a stack trace which I'll look into a bit. I'm a bit limited as to what I can test because the DB is an AWS RDS PostgreSQL db, it's the production one and the error really only occurs with a fluke, though the more people on it, the more likely it happens. – Rudiger Oct 09 '17 at 03:37
  • 1
    Re ID generation, there's no way to tell how the framework picks the ID it passes as parameter `$8` from the query. That's why I suggest setting `log_statement = all` and a better `log_line_prefix`, so you can see what is done during a whole session. It could be calling `nextval(..)` on a sequence manually, then passing the result to the `insert`. Which would be fine. – Craig Ringer Oct 09 '17 at 04:33
  • Yeah fair enough, I'll look into turning it on. – Rudiger Oct 09 '17 at 06:13
  • 1
    `SELECT last_value FROM "TABLENAME_ID_seq"` is ... deeply sussed. You shouldn't access a sequence that way. The only thing you should ever be using is `nextval`, and *if you're inserting a single record only*, `lastval`. Directly `SELECT`ing from the sequence relation will stop working in newer PostgreSQL versions, and has never been documented or supported; it's also unsafe under concurrency. So I'd say you found your bug. – Craig Ringer Oct 09 '17 at 13:19

3 Answers3

4

That note in the documentation is very out of date (even SilverStripe 2.1 from 2007 has the correct behaviour) and the approach described by the docs would lead to race conditions.

The complexity is that SilverStripe uses multi-table inheritance and what SilverStripe does is such cases is this:

  • Insert into the SiteTree table
  • Fetch the ID generated
  • Insert into the Page table (and other tables) using the same ID

It may also do subsequent UPDATE writes to the SiteTree table with the same ID.

Unfortunately this doesn't necessarily help you solve your issue, but it can at least close off one possible source of the issue.

Sam Minnée
  • 724
  • 5
  • 14
  • Thanks Sam, that describes the section I found in the framework previously. I'm dealing with a plain `DataObject` so I I don't believe related at all to the SiteTree table. So from 2007 it should use the db's built in auto increment? Sort of contrary to what I'm currently experiencing with the race condition in production. – Rudiger Oct 08 '17 at 23:09
  • 2
    @Rudiger Can you enable `log_statement = all` in PostgreSQL please, and set your `log_line_prefix` to something like `%m %p %u %d %a %v:%x %c:%l` ? Then `SELECT pg_reload_conf()`. This will make PostgreSQL output transaction ID information and session information, so you can see what exactly the ORM is sending to the database and in what transactions. This will let you find the error in the logs, then track back to reassemble the session that led up to it and find out what exactly is happening. – Craig Ringer Oct 09 '17 at 01:21
2

Edit: The documented SilverStripe's generated key support is broken, and uses a method of identifier generation that won't work properly. However, one of the devs has confirmed that this is a documentation bug, and the framework's true behaviour is no longer to use max() queries. So the problem is not there.

For anyone wondering why it's wrong to use max(...) for key generation: it's totally concurrency unsafe. Even in subqueries. If you do:

INSERT INTO my_table(id, ...)
VALUES
(
  (SELECT max(id) + 1 FROM my_table),
  ...
);

then both SELECTs can run at the same time. They'll get the same result, and then both inserts will try to insert the same value. Even if one insert completes before the other select runs, if it hasn't committed yet, the other select won't see the new value.

It's only safe to do that if you LOCK TABLE first, or SELECT ... FOR UPDATE in the subquery. A SELECT ... FOR UPDATE is much slower in this case.

So if you can modify SilverStripe, change it to at least send a LOCK TABLE mytable IN EXCLUSIVE MODE; before the SELECT max(...) so it's slow and clumsy but not also broken.

Or, better, fix it to just use a database sequence.

If there's a real business need for gap-less numbering, use a counter table maintained by UPDATE ... RETURNING ... instead. (If you need portability you must use SELECT ... FOR UPDATE then UPDATE, in the same transaction).

Update: The framework no longer uses that approach in recent versions.

(Removed grumpy rant about the approach as non-constructive)

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Brutal. To be honest this was the first time I've come across something in the framework that was wtf worthy. I'd really like to know the reason why they chose to use a max + 1. SQL injection wise I know it's very solid, I've had a good hack at it. – Rudiger Oct 06 '17 at 06:18
  • I don't know, but it's "SQL 101 failed" level bad. I have no idea what they were smoking, but they need to fix it or that framework is 10-foot pole territory as far as I'm concerned. – Craig Ringer Oct 06 '17 at 06:48
  • @CraigRinger I'm not sure that you can justify your assumptions on a framework by one feature that doesn't tick your box. I've seen frameworks that use autoincrement that have real nasty injection issues so those don't come hand in hand. Of course it should use autoincrement but id suppose there has been an edge case reason to do that way. Every framework has its comprimises on particular aspects – Olli Tyynelä Oct 06 '17 at 07:37
  • 1
    There are perfectly sensible reasons not to use autoincrement. There are none I can think of to use an *obviously* wrong method to select candidate keys. If they used their own homebrew sequences with `select .... for update` that'd be slow but OK. If they did a `LOCK TABLE` first that'd be OK too, it'd be slow but correct. This ... this is just "author failed introduction to SQL" wrong. So it's not a feature that doesn't tick my box. It's a bug as amazing as seeing production code that, say, takes variables from `$_GET` and uses them as SQL identifier names, or trusts stuff in cookies... – Craig Ringer Oct 06 '17 at 08:16
  • @OlliTyynelä It's entirely possible the framework is otherwise OK. But ... imagine you saw someone's code and it seemed OK, then you saw it took stuff in `$_GET` and used it as identifiers concatenated into SQL queries. This bug is that level of "WTF". It'd beyond a "code smell" like use of `addslashes()`. – Craig Ringer Oct 06 '17 at 08:26
  • @CraigRinger but to answer a with "its bad, dont use that (node,zend,mysql,postreqsl, what ever) because of this one stupid thing that annoys me" could be used on EVERY so question. Imho doesn't help the asker. – Olli Tyynelä Oct 06 '17 at 09:40
  • @OlliTyynelä Fair, I guess. That said, I did address the actual issue, I just then expressed my astonishment that it existed. – Craig Ringer Oct 06 '17 at 09:59
  • 6
    Hi everyone, I was original lead dev of SilverStripe, the bug here is in the documentation. I think SilverStripe may have used the approach described in 2007, and we pretty rapidly identified the issues with that. It's embarrassing that the documentation has been out of date for all that time, but we'll fix that now. Craig: the framing of your comments are quite rude. I'd recommend that you take a less vehement approach to express yourself, you're likely to find it easier to get people on board with your ideas if you do. – Sam Minnée Oct 08 '17 at 22:21
  • 2
    @SamMinnée Fair, and thanks. You're right, and I should know better. Revised. – Craig Ringer Oct 09 '17 at 01:09
  • 1
    ... especially since I'm responsible for some truly awful code myself. – Craig Ringer Oct 09 '17 at 01:21
  • Looks like the PostgreSQL module could be at fault, it accessed it different than the SilverStripe core MySQL classes. A pull request has been merged into and so far I haven't replicated the issue. – Rudiger Oct 21 '17 at 09:46
1

As @CraigRinger pointed out, the way the inserted ID was being accessed was across all sessions, rather than per session. I've updated the PostgreSQL module to instead use currval() which is session based.

So far I haven't replicated the issue again however I'm not entirely convinced this is the core of the issue. I'll update this if it faults again.

Rudiger
  • 6,749
  • 13
  • 51
  • 102