0

As far as I understand, when we perform searching by indexed column there're 3 steps to make it:

  1. Tree traversal (finding the first matching entry)

  2. Requesting the following leaf node.

  3. Fetching the table.

Why do we need to perform the third step? Couldn't you explain its purpose?

user3663882
  • 6,957
  • 10
  • 51
  • 92

1 Answers1

2

I'm not sure what your source means by "fetching the table" but in a non-clustered index the leaf nodes contain references to the location of the data. So the last step would be to go to that location and collect the necessary table data.

In a clustered index the data is stored as part of the leaf node, so the data is still fetched but it's from essentially the same location.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Makes sense, thank you. But how can I find out if the postgresql creates clustered index or nonclustered one? – user3663882 Jul 02 '15 at 17:20
  • A table can only have one clustered index - you have to explixitly set the clustered index with a command like `ALTER TABLE member CLUSTER ON member_name_idx;` – D Stanley Jul 02 '15 at 17:28
  • @DStanley: `CLUSTER ON` does not actually cluster the table; this is done with [the cluster command](http://www.postgresql.org/docs/9.4/static/sql-cluster.html). `CLUSTER ON` just sets which column to use when clustering. – Kevin Jul 02 '15 at 18:51
  • @Kevin Thanks for the clarification. I was going off of the documentation and did not catch that part. – D Stanley Jul 02 '15 at 19:03
  • "Clustering" in Postgres doesn't quite work like this. It's a one-off reordering of the physical storage, but this order is not maintained after subsequent changes to the table, and the structure of the index itself is unaffected. So the lookup/fetch procedure is identical to a non-clustered index. – Nick Barnes Jul 02 '15 at 23:00
  • @NickBarnes: If you cluster regularly, and perform an `ANALYZE` after each cluster, the query planner will be (indirectly) aware of your clustering, and this will influence how it executes queries, including the choice of indices vs. a flat table scan. – Kevin Jul 06 '15 at 16:13
  • 1
    @Kevin: Sure, but the answer says that a clustered index contains full tuples rather than heap pointers, and that no heap fetches are required. And while this is true of [some implementations](http://stackoverflow.com/a/24470091/1104979), it is not the case in Postgres. – Nick Barnes Jul 07 '15 at 02:08
  • @Nick: The table is the clustered index. It contains full tuples and is in order. – Kevin Jul 07 '15 at 03:12
  • @Kevin: Again, true of other RDBMSs, but not Postgres. The table is still an unordered heap, and the index leaves still contain heap pointers. Clustering will keep the heap *approximately* in order, for the sake of improving cache hit rates during index scans, but a clustered index is not a distinct data structure like it is in Oracle/MSSQL/MySQL. – Nick Barnes Jul 07 '15 at 06:32