0

I got this query from a fellow programmer, but it executes extremely slowly. it takes about 4 seconds to run.

Can this query be optimized to give the same result, but perform better?

SELECT checkedcrates, 
   pv.name as powervision, 
   cg.name as cgrp, 
   cs.name as csz, 
   c.name as cname
   FROM  public.inspectionresultsstatistic e, 
        crates c, 
        powervisions pv, 
        lines l, 
        quality q, 
        cratesgroupscrates cgc, 
        cratesgroups cg, 
        cratessizes cs 
   where
        c.id = e.crateid 
        and l.id = e.lineid 
        and pv.id = l.powervisionid
        and q.id = e.qualityid 
        and c.id = cgc.crateid 
        and cs.id = cgc.cratesizeid 
        and cg.id = cgc.crategroupid 
        and qualityid = 0
        and pv.name in ('PV101')  
        and c.name in ('24603','104','136','154','186','106','156','216','246','206')
        and cg.name in ('Black','Blue','DLL','Green')
        and cs.name in ('30x40','60x40')
        and to_timestamp(e.startts) >= '2021-10-18T17:45:22Z' 
        and to_timestamp(e.stopts-1) <= '2021-10-18T19:45:22Z'
   group by 
     powervision, 
     cgrp, 
     csz, 
     cname, 
     checkedcrates, 
     startts
 

EDIT: actually just noticed that it is the inner select being slow... updated the query above by removing the outer query

EDIT2: maybe I should add some indexes? I have a index for every table where it is connected to the other (so all the ID columns) and have for the inspectionresultsstatistic a index on id + qualityid + startts + stopts

EDIT3: as per request I try to give more informations about my tables and the data.

I am using PostgreSql 12,

the table structures are as follows:

http://sqlfiddle.com/#!17/bb5a6/1

all tables are rather small with less than 50 entries, except for inspectionresultsstatistics that contains about 12.000.000 rows.

sharkyenergy
  • 3,842
  • 10
  • 46
  • 97
  • 2
    The path to optimization here is adding useful indexes to your tables. To help you work this out, we need a bunch more information. Please tag your question with the DBMS you use ([tag:oracle], [tag:mysql], [tag:sql-server], [tag:postgresql]). And, please [read this](https://stackoverflow.com/tags/query-optimization/info), then [edit] your question. – O. Jones Nov 02 '21 at 13:20
  • The construct `to_timestamp(column)` defeats any index on `column` by making your WHERE clause [non-sargable](https://stackoverflow.com/questions/799584/what-makes-a-sql-statement-sargable). It's impossible to give you decent advice without knowing the data types of your columns. – O. Jones Nov 02 '21 at 13:32
  • @O.Jones Ok thank you, updating asap. Is there a workaround, to the to_timestamp problem? can this be put on the right side of the equation? – sharkyenergy Nov 02 '21 at 13:47
  • Yes, the to_timestamp() clauses can be refactored. But not without knowing what's in those columns. – O. Jones Nov 02 '21 at 13:57
  • @O.Jones I updated the post, with a SQL fiddle that contains allt he tables.. all tables are rather small with less than 50 entries, except for inspectionresultsstatistics that contains about 12.000.000 rows. – sharkyenergy Nov 02 '21 at 14:24
  • Presumably this query was written 30 years ago, implementing explicit join syntax will help eliminate any unforseeen cartesian or corss join scenarios; `to_timestamp(e.stopts-1)` is non-sargable is will probably be forcing a table/index *scan*. – Stu Nov 02 '21 at 14:58

2 Answers2

0

My big tell with this is that your joins are not written with more updated syntax. When you use commas between tables in your FROM clause, it creates a 'Cartesian Product' of both tables. This is much more resource intensive than using 'INNER JOIN' followed by an 'ON' clause. The ON clause should be used instead of a constraint in the WHERE clause.

Somthing like,

   SELECT checkedcrates, 
    pv.name as powervision, 
    cg.name as cgrp, 
    cs.name as csz, 
    c.name as cname
FROM  public.inspectionresultsstatistic e
    INNER JOIN crates c
        ON c.id = e.crateid 
    INNER JOIN lines l
        ON l.id = e.lineid 
    INNER JOIN powervisions pv
        ON pv.id = l.powervisionid
    INNER JOIN quality q
        ON q.id = e.qualityid
    INNER JOIN cratesgroupscrates cgc
        ON c.id = cgc.crateid
    INNER JOIN cratesgroups cg
        ON cg.id = cgc.crategroupid
    INNER JOIN cratessizes cs 
        ON cs.id = cgc.cratesizeid
where qualityid = 0
    and pv.name in ('PV101')  
    and c.name in ('24603','104','136','154','186','106','156','216','246','206')
    and cg.name in ('Black','Blue','DLL','Green')
    and cs.name in ('30x40','60x40')
    and to_timestamp(e.startts) >= '2021-10-18T17:45:22Z' 
    and to_timestamp(e.stopts-1) <= '2021-10-18T19:45:22Z'
group by 
    powervision, 
    cgrp, 
    csz, 
    cname, 
    checkedcrates, 
    startts
Clifford Piehl
  • 483
  • 1
  • 4
  • 11
  • Actually, Oracle cost-based query planners recognizes and handles your grandfather's comma-joins with the join criteria in WHERE clauses with the same performance as joins with ON clauses. ON clauses are still far easier to read. – O. Jones Nov 02 '21 at 13:22
  • thanks fr the answer. I just tried to run your query and it takes the same amount of time. maybe I should create some indexes? what indexes should i crate to speed this specific query up`? (updating the question to include this) – sharkyenergy Nov 02 '21 at 13:24
  • @O.Jones And that is why Oracle is 5 figures per core! I have seen tangible performance increases using SQL Server, but it does not seem to help in this case :( – Clifford Piehl Nov 02 '21 at 13:42
  • Other DBMS makes and models also handle comma-join syntax efficiently, including sql-server, mysql, and postgresql. You're right that Oracle is astoundingly expensive. – O. Jones Nov 02 '21 at 13:46
  • Planners have no trouble with the join conditions in the WHERE. It is the humans who write and read queries that benefit from having joins broken out into explicit ON. – jjanes Nov 02 '21 at 16:04
0

Your fiddle doesn't include the indexes. You should use db-fiddle.com rather than sqlfiddle as the latter is essentially broken, having nothing newer than 9.6 to offer.

The time range being queried is pretty small, so presumably is very selective.

You would be better querying that as a range:

and tstzrange(to_timestamp(e.startts),to_timestamp(e.stopts-1),'[]') <@ 
    tstzrange('2021-10-18T17:45:22Z','2021-10-18T19:45:22Z','[]')

Which would benefit from the expression index

on inspectionresultsstatistic using gist (tstzrange(to_timestamp(startts),to_timestamp(stopts-1),'[]'))

But really, probably better off to store your timestamps as timestamptz rather than int8, or maybe even store them as tstzrange directly.

jjanes
  • 37,812
  • 5
  • 27
  • 34