During my reading to the book [The internals of PostgreSQL] I have been in the chapter 3
Book chapter link: https://www.interdb.jp/pg/pgsql03.html
at 3.2.2. Index Scan
specifically at 3.2.2.1. Start-Up Cost
The following query is being explained and the cost of the index is to be counted here
SELECT id, data FROM tbl WHERE data < 240;
Before estimating the cost, the numbers of the index pages and index tuples, N_index_page, N_index_tuples
testdb=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'tbl_data_idx';
relpages | reltuples
----------+-----------
30 | 10000
(1 row)
Nindex,tuple = 10000
Nindex,page = 30
It is mentioned that
The start-up cost of the index scan is the cost to read the index pages to access to the first tuple in the target table, and it is defined by the following equation:
‘start_up cost
’ = {ceil(log2(Nindex,tuple)) + (Hindex + 1) × 50} × cpu_operator_cost
Where Hindex: is the height of the index tree.
Nindex,tuple is 10000; Hindex is 1; cpu_operator_cost
is 0.0025 (by default). Thus,
‘start_up cost
’ = {ceil(log2(10000)) + (1 + 1) × 50} × 0.0025 = 0.285
My question comes here at that equation What is that 50? Is it a heuristic number has a statistically meaning?
Cost size source code link (has not mentioned that 50 at all from my narrow checking) https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c#L502-L792
I have tested that on Postgres server to make sure of the startup cost estimation and found that
test=# EXPLAIN SELECT id, data FROM tbl WHERE data < 240;
QUERY PLAN
---------------------------------------------------------------------------
Index Scan using tbl_data_idx on tbl (cost=0.29..13.47 rows=239 width=8)
Index Cond: (data < 240)
(2 rows)
Cost is 0.29 Same as (math.ceil(math.log2(10000))+(1+1)*50)*0.0025
Same as which is mentioned on the book