0
select * from table where username="johndoe"

In Postgres, if username is not a primary key, I know it will iterate through all the records.

But if it is a primary key field, will the above SQL statement iterate through the entire table, or terminate as soon as username is matched. In other words, does "where" act differently when it is running on a primary key column or not?

Aayush Karki
  • 781
  • 3
  • 10
  • 25
  • Why do you have a column named `johndoe`? For details on how the databases uses an index, you might want to read http://use-the-index-luke.com/ –  Jun 30 '17 at 05:34
  • 'johndoe' is not a column. 'username' is a column. 'johndoe' is an entry into the column – Aayush Karki Jun 30 '17 at 05:35
  • 3
    `"johndoe"` is an identifier (=column), `'johndoe'` would be a string value –  Jun 30 '17 at 05:35
  • Thanks a lot. I never knew double and single quotes meant different in SQL – Aayush Karki Jun 30 '17 at 05:37
  • 2
    @KeyurPanchal: no it will not. If there is an index, the database will use that index to find the row. –  Jun 30 '17 at 05:39
  • @AayushKarki: see here: https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS –  Jun 30 '17 at 05:40
  • @KeyurPanchal: that is hardly iterating through the **whole** table. –  Jun 30 '17 at 06:11

1 Answers1

0

Primary keys (and all indexed columns for that matter) take advantage of indexes when those column(s) are used as filter predicates, like WHERE and JOIN...ON clauses.

As a real world example, my application has a table called Log_Games, which is a table with millions of rows, ID as the primary key, and a number of other non-indexed columns such as ParsedAt. Compare the below:

INDEXED QUERY

EXPLAIN ANALYZE
SELECT *
FROM "Log_Games"
WHERE "ID" = 792046

INDEXED QUERY PLAN

Index Scan using "Log_Games_pkey" on "Log_Games"  (cost=0.43..8.45 rows=1 width=4190) (actual time=0.024..0.024 rows=1 loops=1)
  Index Cond: ("ID" = 792046)
Planning time: 1.059 ms
Execution time: 0.066 ms

NON-INDEXED QUERY

EXPLAIN ANALYZE
SELECT *
FROM "Log_Games"
WHERE "ParsedAt" = '2015-05-07 07:31:24+00'

NON-INDEXED QUERY PLAN

Seq Scan on "Log_Games"  (cost=0.00..141377.34 rows=18 width=4190) (actual time=0.013..793.094 rows=1 loops=1)
  Filter: ("ParsedAt" = '2015-05-07 07:31:24+00'::timestamp with time zone)
  Rows Removed by Filter: 1924676
Planning time: 0.794 ms
Execution time: 793.135 ms

The query with the indexed clause uses the index Log_Games_pkey, resulting in a query that executes in 0.066ms. The query with the non-indexed clause reverts to a sequential scan, which means it goes from the start to the finish of the table to see which columns match, an operation that causes the execution time to blow out to 793.135ms.

There are plenty of good resources around the web that can help you read execution plans and decide when they may need supporting indexes. A good place to start is the PostgreSQL documentation: https://www.postgresql.org/docs/9.6/static/using-explain.html

e_i_pi
  • 4,590
  • 4
  • 27
  • 45