I have a database table where each entry is a resource that can be used once by each user. Each user can only use resources that they have not already used, but the same resource can be used by different users, so if I'm not mistaken, the users and resources are in a M:N relationship, this is why I have an association table between them. The schema looks like this:
CREATE TABLE users (
id SERIAL NOT NULL,
PRIMARY KEY(id)
)
CREATE TABLE resources (
id VARCHAR NOT NULL,
created_at TIMESTAMP,
PRIMARY KEY(id)
)
CREATE TABLE resource_usage (
resource_id VARCHAR NOT NULL,
user_id SERIAL NOT NULL,
created_at TIMESTAMP,
PRIMARY KEY(resource_id, user_id),
FOREIGN KEY(resource_id) REFERENCES resources (id),
FOREIGN KEY(user_id) REFERENCES users (id)
)
I run the following SQL query (PostgreSQL 11.16) to select the first available resource for a specific user.
SELECT * FROM resources WHERE NOT EXISTS(
SELECT 1 FROM resource_usage
WHERE resources.id = resource_usage.resource_id
AND resource_usage.user_id = 1
)
LIMIT 1
To add some context, there should never be a scenario where there is no available resource for a user. I keep metrics of available resources for each user and add more accordingly. The resources
table contains a little over a million of rows, but it will keep growing. There is no explicit order in which the resources should be consumed.
As both the resources
and resource_usage
tables grow larger, the nested loop anti-join takes longer to complete (it takes longer to find the first available resource). My question is how do I write an index that would make this operation more efficient.
So far I have tried to create a sorting index where I sorted the resources by the time they were inserted. This made sense to me, because if the resource has only been added now, it's likely it wasn't used yet at all. This actually helped a little with the performance issue, but I don't add new resources very often and the performance starts to degrade quickly.