0

I have a DB2 query on the TPC-H schema, which I'm trying to optimize with indexes. I can tell from db2expln output that the estimated costs are significantly lower (300%) when indexes are available. However I'm also trying to measure execution time and it doesn't change significantly when the query is executed using indexes.

I'm working in an SSH terminal, executing my queries and writing output to a file like

(time db2 "SELECT Q.name FROM 
(SELECT custkey, name FROM customer WHERE nationkey = 22) Q WHERE Q.custkey IN 
(SELECT custkey FROM orders B WHERE B.orderkey IN 
(SELECT orderkey FROM lineitem WHERE receiptdate BETWEEN '1992-06-11' AND '1992-07-11'))") &> output.txt 

I did 10 measurements each: 1) without indexes, 2) with index on lineitem.receiptdate, 3) with indexes on lineitem.receiptdate and customer.nationkey, calculated average time and standard deviation, all are within the same range. I executed RUNSTATS ON TABLE schemaname.tablename AND DETAILED INDEXES ALL after index creation.

I read this post about output of the time command, from what I understand sys+user time should be relevant for my measurement. There is no change in added sys+user time, not either in real.

sys + user is around 44 ms, real is around 1s.

Any hints why I cannot see a change in time? Am I interpreting time output wrong? Are the optimizer estimations in db2expln misleading?

Disclaimer: I'm supposed to give a presentation about this at university, so it's technically homework, but as it's more of a comprehension question and not "please make my code work" I hope it's appropriate to post it here. Also, I know the query could be simplified but my question is not about this.

Community
  • 1
  • 1
prinz_minz
  • 49
  • 1
  • 8
  • 1
    How long does the query take? `time` won't be reliable for anything under several hundred ms. – that other guy Jul 10 '16 at 17:50
  • edited, so maybe my query is too short. it's because time's own execution time is too long, like explained [here](http://stackoverflow.com/questions/9006596/is-the-unix-time-command-accurate-enough-for-benchmarks)? – prinz_minz Jul 10 '16 at 18:58
  • You also need to understand that `time` measures the system + user CPU time for the process it executes. In this case, it's measuring the CPU time for the `db2` front end process. This process communicates (via IPC) with another process (the DB2 Backend Process – `db2bp`), which communicates via either TCP/IP or IPC with the DB2 database engine. No CPU time consumed by the backend process or the database engine will be reported in the `sys` + `user` time. – Ian Bjorhovde Jul 11 '16 at 04:36
  • thank you for adding this. so `real` provides more signifcant information about execution time than `sys`+`user` as changes in the duration of the background process would be visible in the former, but not in the latter? – prinz_minz Jul 11 '16 at 13:07

2 Answers2

1

The optimizer estimates timerons (measurement of internal costs) and these timerons cannot be translated one to one into query execution time. So a difference of 300% in timerons does not mean you will see a 300% difference in runtime.

Measuring time for one or more SQL statements I recommend to use db2batch with the option

-i complete

MichaelTiefenbacher
  • 3,805
  • 2
  • 11
  • 17
-1

`SELECT f1.name FROM customer f1 WHERE f1.nationkey = 22 and exists (

select * from orders f2 inner join lineitem f3 on f2.orderkey=f3.orderkey where f1.custkey=f2.custkey and f3.receiptdate BETWEEN '1992-06-11' AND '1992-07-11'

)`

Esperento57
  • 16,521
  • 3
  • 39
  • 45