My app receives events from external services, then resolves an external id received in the event to an internal user id, if no internal user exists it should create one.
My users
table is defined in this way:
CREATE TABLE users (
id SERIAL PRIMARY KEY, /* internal unique ID */
external_ids text[], /* array of unique external IDs for this user */
/* ... irrelevant data fields */
);
When a new event is received I have to resolve the internal user id
for that event to further determine what is the appropriate response.
If no user is found, a new user should be created with a new id
and external_ids
containing the event.external_id value.
The system can receive multiple concurrent events for the same external_id
(which should point to the same user id
).
The reason external_ids
is defined as an array and not a simple string is because the there is a different kind of event that can specify an id
and an external_id
to add to that user (so future events can resolve the same user id
using different external_ids
).
Simple pseudocode demonstrating my desired flow:
event = read_event()
external_id = event.external_id;
internal_user = exec_sql(`SELECT * FROM users WHERE external_ids && text["{external_id}"]`)
if not internal_user:
internal_user = exec_sql(`INSERT INTO users (external_ids) VALUES(text["{external_id}"])`)
write_response(internal_user)
This code does not handle a case where 2 concurrent threads get nothing from the SELECT statement. It then INSERTs 2 different rows to the users
tables for the same value, which should never happen.
I use PostgreSQL 11. I have read the documentation for Transaction Isolation and INSERT (and its "ON CONFLICT" section) along with trying to find a suitable answer online, it seems that with an index it is possible to force unique arrays but not unique array values in arrays as described in Postgres UNIQUE CONSTRAINT for array.
I assume that unnesting the users
table and moving external_ids
to a separate users_external_ids
table which will be defined somewhat like this:
CREATE TABLE users_external_ids (
id REFERENCES users(id), /* internal unique ID from the users table */
external_id text, /* single external_id (this table will have multiple rows to represent the array) */
)
can allow the use of INSERT ON CONFLICT (id, external_id) DO NOTHING
for this table.
Finally, my questions are:
Is there another solution that would not involve changing the current data model? (and at what cost, this is probably the preferable solution for me)
Am I correct assuming splitting the tables is the "correct" course of action and will solve the problem I face?
Is there something I'm missing here?
Thank you