40

Say, I have a table ResidentInfo with a unique constraint on the column HomeAddress, which is VARCHAR type.

I plan to add an index on this column. The query will only have operation =. I'll use a B-TREE index since the Hash indexes are not recommended currently.

Question:
For efficiency of this B-TREE index, should I add a new column with numbers 1,2,3....,N corresponding to different home addresses,and index that number instead?

I ask this question because I don't know how indexes work.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Hao
  • 403
  • 1
  • 4
  • 5
  • According to performance there is one guideline that always applies: test it. It is impossible to get all your usecases from such vague description so when you are asking about speed, test what is fastest for you. There are cases where theoretically suboptimal approach is faster for data you usually process. – omikron Oct 23 '15 at 10:06

2 Answers2

54

For simple equality checks (=), a B-Tree index on a varchar or text column is simple and the best choice. It certainly helps performance a lot. And a UNIQUE constraint (like you mentioned) is already implemented with such an index, so you would not create another one.

Of course, a B-Tree index on a simple integer performs better. For starters, comparing simple integer values is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ...

Since the HomeAddress is hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary key instead. A serial column or IDENTITY in Postgres 10+ is the obvious choice. Its only purpose is to have a simple, fast primary key to work with.

If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate PK can stay the same (but doesn't have to, of course).

Your table could look like this:

CREATE TABLE resident (
   resident_id serial PRIMARY KEY
 , address text NOT NULL
   -- more columns
);

CREATE INDEX resident_adr_idx ON resident(address);

This results in two B-Tree indexes. A unique index on resident_id (implementing the PK) and a plain index on address.

Postgres offers a lot of options - but you don't need any more for this simple case. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you so much! This really helps! So, the two B-Tree indexes gonna speed up queries like "SELECT * FROM resident WHERE resident_id=xxxxx;" and also give me an option in case I have to query using address, am I right? – Hao Jun 06 '13 at 02:35
  • 1
    @Hao: Correct. Plus, both indexes support more than simple equality checks. – Erwin Brandstetter Jun 06 '13 at 12:23
  • Thank you! As you mentioned, with regard to B-TREE's operations, EnterpriseDB's Hash Pattern Index still has flaw right now, and I may switch to Hash Pattern once they fix it, since I'm only using "=" operation for query. Takes O(1) for Hash and O(nlogn) for B-Tree. – Hao Jun 06 '13 at 17:08
12

In Postgres, a unique constraint is enforced by maintaining a unique index on the field, so you're covered already.

In the event you decide the unique constraint on the address is bad (which, honestly, it is: what a spouse creating a separate account? about flatshares? etc.), you can create one like so:

create index on ResidentInfo (HomeAddress);
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • Oh, thanks for pointing that out! But the question still remains there. Will the query becomes faster if I add a number column and use it instead of address? – Hao Jun 04 '13 at 18:25