2

I have a column that is going to store usernames of users of my website. I noticed the indexing capability in PostgreSQL and wanted to use it. When does it actually make sense to use them?

I started getting the sense that they should be only used when the search is altered. For example instead of searching for original username (SELECT * FROM table WHERE username='user'), you search for (SELECT * FROM table WHERE username=(lower(user))). Is that understanding correct?

Also, if answer to the above question is yes, then does that mean that SQL indexes by deafault on its own and only if you need some special indexes, you need to run CREATE INDEX

layonez
  • 1,746
  • 1
  • 16
  • 20
  • 2
    No, in general Postgres won't index anything by default. Besides this, you seem to be asking multiple things here. Postgres supports regular column indices as well as functional indices. So, both of your queries could benefit from a particular type of index. – Tim Biegeleisen Nov 07 '17 at 01:52
  • @TimBiegeleisen does it not index the PK by default? –  Nov 07 '17 at 01:54
  • 1
    PK's in Postgres should automatically have an index created for them. – Tim Biegeleisen Nov 07 '17 at 01:58
  • regarding use lower(user) I suggest you read this excellent answer https://stackoverflow.com/a/7005656/2067753 – Paul Maxwell Nov 07 '17 at 02:08

3 Answers3

3

Q: When does it make sense to create an index?

We create indexes for basically two reasons 1) to enforce UNIQUE constraints and 2) improve performance and reduce contention. (In terms of PostgreSQL, when we declare a UNIQUE CONSTRAINT, the database automatically creates a UNIQUE INDEX.)

Q: Is that understanding correct?

Not entirely, no. When we execute a query with predicate WHERE col = 'somevalue', without an index, the database will need to evaluate every row in the table. And on very large table, that can be a significant amount of work. With an appropriate index available the execution plan can eliminate vast swaths of rows that would otherwise need to be checked. That's where we get the real performance boost.

Think of this analogy. When we lookup a word in the dictionary, if the entries are not "in order", we would start looking at every entry in the dictionary, and compare the word we are looking for to each and every entry, until we found a match.

Fortunately, in most dictionaries, entries sorted in alphabetical order. we can make effective use of that, and very quickly eliminate a whole lot of pages from consideration, and save ourselves a lot of time. If we are looking for the word 'ostensibly', we know we can start at "O". We can "skip over" all of the entries on pages with words starting with A, B, ..., N, and also "skip over" all of the pages for words starting with P, Q, ..., Z.

In a similar way, the database engine can save a lot of work, making use of an index to "skip over" a whole lot of pages that it knows can't contain the row(s) it is looking for.

Q: ... indexes by default on its own..?

For the PRIMARY KEY constraint, yes, PostgreSQL automatically creates an index.

For a UNIQUE CONSTRAINT, yes, PostgreSQL automatically creates an index.

All other indexes must be declared (created) explicitly.

We typically want to create indexes on foreign keys, and on columns (or expressions, or sets of columns) with leading columns that have high cardinality/high selectivity and are referenced in predicates that are sargable.

Appropriate indexes can significantly improve performance; but indexes also come at a cost... there's additional storage and processing for index maintenance, so we shouldn't create singleton column indexes willy nilly. Care should be taken to evaluate the actual queries that are being executed (or we plan to execute) and implement indexes that most effectively support those operations.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • 2
    @nikolaevra: i attempted to give a broad overview of relational database indexes in general, and not dive into the weedy details. With this overview, we are ready to talk about indexing strategy... what makes for a "good" index, and what indexes are not very useful. And that discussion really depends on the characteristics of the data, and the characteristics of the operations. e.g. for a query like `SELECT upc FROM hughjasstable WHERE upc = 'someval'`, an index on the `upc` column would be of most benefit, but the query could also make effective use of index on `(upc,lot,sz)`. – spencer7593 Nov 07 '17 at 14:05
2

PostgreSQL provides several index types: B-tree, Hash, GiST and GIN. Each index type uses a different algorithm that is best suited to different types of queries. By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations.

B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators:

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42up through the last entry with a = 5. Index entries withc >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

The LOWER function accepts an argument that is a string e.g., char, varchar, or text and converts it to lower case format. If the argument is string-convertible, you use the CAST function to explicitly convert it to a string..

See this for more info

Vijunav Vastivch
  • 4,153
  • 1
  • 16
  • 30
-1

You should index columns/cases that are searched for frequently. So if you frequently query for lower(user), you should create an index for it. PGSQL only creates an index on your primary key by default. See this link by w3schools for more info

BeardMagician
  • 647
  • 13
  • 26