Clustered index in PostgreSQL?

Question

I have a table :

CREATE TABLE users (
        id BIGSERIAL PRIMARY KEY,
        first_name varchar(255) NOT NULL,
        last_name varchar(255) NOT NULL,
        cell_id   BIGINT                  
        ...
)

cell_id is an uint64 from s2 and can represent any position on the planet.

-> click here for good description of s2

Now I'd like to have an index on cell_id mostly to use an equality operator.

CREATE INDEX user_position ON users (cell_id);

But now I'm afraid that this index is going to be overly updated and queried and in the end lead up to deadlocks.

So I've had the idea of doing something like this

CREATE INDEX user_position_even ON users (cell_id) WHERE user id % 2 = 0
CREATE INDEX user_position_odd ON users (cell_id) WHERE user id % 2 = 1

and even may be add more indexes/scarcity.

Now I have some questions:

Will Postgres use both indexes when I do a query?
Would this help to keep performances up?
Was my first concern wrong?
Should I just have different tables instead of indexes?
Would there be another way of doing this better?

Your queries, which does not filter the `id` column (f.ex. only queries the `users` table by location aka. `cell_id`) won't use any of your "clustered" indexes. -- Indexes on their own won't cause deadlocks, usually multiple tables are involved in deadlocks. — pozs, Jan 16 '17 at 11:58
Sidenote: PostgreSQL [doesn't have a true unsigned integer type](http://stackoverflow.com/questions/20810134/why-unsigned-integer-is-not-available-in-postgresql). If you can work around this in your client (f.ex. with some binary re-interpretation) that's cool, otherwise you could use some extension, f.ex. [`pguint`](https://github.com/petere/pguint). — pozs, Jan 16 '17 at 11:58
On your first comment : So because I'm using equality the index would not be used ? — Azr, Jan 16 '17 at 13:09
My point is that queries with `FROM users WHERE cell_id = $1 AND id % 2 = 0` will certainly use that index, `WHERE cell_id = $1 AND id = $2` could use one of those indexes, but only `WHERE cell_id = $1` won't use any of those indexes. — pozs, Jan 16 '17 at 13:35

score 0 · Accepted Answer · edited Jan 16 '17 at 17:09

0

Will Postgres use both indexes when I do a query ?

While postgresql can certainly use more than one index on a table, your id will either be divisible by 2 or indivisible so only one of these indexes can be used.

Would this help to keep performances up ?

Unlikely. The default index type on postgresql is B-Tree and here you are practically doing part of what the index would do for you, on your own. But benchmark just to be sure.

Was my first concern wrong ?

I would say this is called premature optimization. B-Tree indexes are pretty good at dealing with it. Come back here if you actually run into a deadlock

B-tree, GiST and SP-GiST indexes
Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each index row is fetched or inserted. These index types provide the highest concurrency without deadlock conditions.

Should I just have different tables instead of indexes ?

No, Postgresql can handle very large tables. If this ever becomes a concern. partition.

Would there be another way of doing this better ?

You don't have something that's broken and needs fixing. Come back when you do.

edited Jan 16 '17 at 17:09

answered Jan 16 '17 at 12:28

e4c5

52,766
11
101
134

Aha, premature optimisation ! Now I can't disagree with you. Thanks for the answer, will try with the simple default way and see if we have to improve. – Azr Jan 16 '17 at 13:02
Also I have to add that lot of those cell_id are going to be similar because cities. This is why I thought there would be locks in pages. But this is certainly going to be mitigated by the fact that there will be a lot of pages. – Azr Jan 16 '17 at 13:06
Or if I reduce the size of a cell – Azr Jan 16 '17 at 13:14
Big int to int will not make much of a difference. If you are concerned about there being duplicate cell_ids the concern may or maynot be justified based on the queries that you execute on it. I think best to run for a few days and post another question with the full query that you find slow and the result of the `explain analyze` – e4c5 Jan 16 '17 at 13:17
owkay ! :) will do ! Thanks for your precious time ! – Azr Jan 16 '17 at 13:21

Clustered index in PostgreSQL?

1 Answers1