To upcomers:
This issue has an interesting solution.
Thanks to these two questions Q1 and Q2, and also the keen questioners and especially genius answerer Erwin Brandstetter, we can use RECURSIVE CTE to achieve partly index skip scan.
As for your example, here is the SQL:
WITH RECURSIVE mid_cte AS (
( -- parentheses required
SELECT x
FROM t
ORDER BY 1
LIMIT 1
)
UNION ALL
SELECT n.*
FROM mid_cte o
CROSS JOIN LATERAL (
SELECT next_t.x
FROM t next_t
WHERE next_t.x > o.x
ORDER BY 1
LIMIT 1
) n
)
SELECT MIN(x),MAX(x),COUNT(DISTINCT x)
FROM mid_cte
But there is something we have to be aware of:
The performance of this index skip scan emulation deeply depends on the cardinality of indexed columns, while table scan only depends on the table size.
Here is my experiment and result within:
1st part. 1k cardinality -- 2.67s table scan vs 0.03s emulated index skip scan
-- all test table t has 1.5million rows and run on a ssd.
-- ################# 1) 1k cardinality of column x #################
DROP TABLE IF EXISTS t;
CREATE TABLE t AS SELECT MD5(LEFT (RANDOM()::TEXT,5)) AS x FROM GENERATE_SERIES(1,1500000);
CREATE INDEX t_x_idx ON t USING btree(x);
-- table scan
SELECT MIN(x),MAX(x),COUNT(DISTINCT x) FROM t;
/*
### result set ###
min max count
... ... 1,161
### use time ###
2.67s
*/
-- emulate index skip scan
WITH RECURSIVE mid_cte AS (
( -- parentheses required
SELECT x
FROM t
ORDER BY 1
LIMIT 1
)
UNION ALL
SELECT n.*
FROM mid_cte o
CROSS JOIN LATERAL (
SELECT next_t.x
FROM t next_t
WHERE next_t.x > o.x
ORDER BY 1
LIMIT 1
) n
)
SELECT MIN(x),MAX(x),COUNT(DISTINCT x)
FROM mid_cte
;
/*
### result set ###
min max count
... ... 1,161
### use time ###
0.03s
*/
2nd part. 10k cardinality -- 3.44s table scan vs 0.09s emulated index skip scan
-- ################# 2) 10k cardinality of column x #################
-- other SQLs omitted here. the other SQLs are the same as above.
......
CREATE TABLE t AS SELECT MD5(LEFT (RANDOM()::TEXT,6)) AS x FROM GENERATE_SERIES(1,1500000);
......
-- table scan
/*
### result set ###
min max count
... ... 10,146
### use time ###
3.44s
*/
-- emulate index skip scan
;
/*
### result set ###
min max count
... ... 10,146
### use time ###
0.09s
*/
3rd part. 100k cardinality -- 4.27s table scan vs 0.89s emulated index skip scan
-- ################# 3) 100k cardinality of column x #################
-- other SQLs omitted here. the other SQLs are the same as above.
......
CREATE TABLE t AS SELECT MD5(LEFT (RANDOM()::TEXT,7)) AS x FROM GENERATE_SERIES(1,1500000);
......
-- table scan
/*
### result set ###
min max count
... ... 100,134
### use time ###
4.27s
*/
-- emulate index skip scan
;
/*
### result set ###
min max count
... ... 100,134
### use time ###
0.89s
*/
4th part. 776k cardinality (50% total rows) -- 5.16s table scan vs 10.38s emulated index skip scan
-- ################# 4) 776k cardinality of column x #################
-- other SQLs omitted here. the other SQLs are the same as above.
......
CREATE TABLE t AS SELECT MD5(LEFT (RANDOM()::TEXT,8)) AS x FROM GENERATE_SERIES(1,1500000);
......
-- table scan
/*
### result set ###
min max count
... ... 776,864
### use time ###
5.16s
*/
-- emulate index skip scan
;
/*
### result set ###
min max count
... ... 776,864
### use time ###
10.38s
*/
You can see when the cardinality increases, the table scan time increase much slower than emulated index skip scan. So when to use this method is important and has to be under consider.