I am working on a project using PostgreSQL9.3. I was using the below query to show how selectivity estimation errors can lead to multi-fold increase in query execution time on TPC-H workload using PostgreSQL8.3.
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and (r_name='ASIA' or r_name='AFRICA')
and o_orderdate >= date '1994-01-01'
and o_orderdate < date '1994-01-01' + interval '1 year'
and l_shipdate <= l_receiptdate
and l_commitdate <= l_shipdate + integer '90'
and l_extendedprice <= 20000
and c_name like '%r#00%'
and c_acctbal <=2400
group by
n_name
order by
revenue desc
The problem was that PostgreSQL8.3 was choosing a plan which involves lot of NestedLoop joins since selectivity estimation on lineitem and customer were wrong by big margin. I think this was majorly due to LIKE pattern matching. But the optimal plan should have been using Hash Joins.
Recently I upgraded to PostgreSQL9.3 for my Project and observed that the above query no longer gives a bad plan. I spent some amount of time trying to find a query with large cardinality estimation error on TPC-H 1GB data with no success till now. Does any PostgreSQL geeks know some off the shelf query on TPC-H benchmark or any query to show cardinality estimation error in PostgreSQL9.3