3

I need to optimise this query by using indexing. I tried to index some of the columns, but it is not helping. Is there anyone have thoughts?

The query I need to optimise:

Select vintage, wine_no, wname, pctalc, grade, price, wine.vid, vname, wine.cid, cname
from vineyard, class, wine
where wine.vid = vineyard.vid
and wine.cid = class.cid
and wine.cid = 'SHIRAZ' and grade = 'A';

I tried to created the following indexes: '''create index wine_vid_idx on wine(vid); create index wine_cid_idx on wine(cid); create index wine_grade_idx on wine(grade);```

My execution plan for the original query is:

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |    42 |  9114 |    10   (0)| 00:00:01 |
|*  1 |  HASH JOIN                    |              |    42 |  9114 |    10   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |              |    42 |  6930 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| CLASS    |     1 |    50 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN         | SYS_C0027457 |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    TABLE ACCESS CLUSTER       | WINE    |    42 |  4830 |     1   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT                                                                                                                                                                                                                                                                                           
---------------------------------------------------------------------------------------------
|   6 |   TABLE ACCESS FULL           | VINEYARD |   160 |  8320 |     8   (0)| 00:00:01 |
Polly
  • 35
  • 5
  • Database performance depends on many different factors, and it's not possible to just look at a query and "tune it". Sure, helpful people will try to guess (and already two people have) but they are just guessing. You need to try things and see what works for you. I urge you to read [this response on optimizing Oracle queries](https://stackoverflow.com/a/34975420/146325). It will explain the range of details we need before we can offer **informed advice**. It may also give you confidence to tackle tuning your query for yourself. – APC Sep 22 '19 at 10:55
  • 1
    If possible, you should consider changing the `CHAR` column type into `VARCHAR` or `VARCHAR2` in order to avoid the various surprises that you may run into when querying against fixed-width columns. Also, depending on the amount of data in your tables, a full-table scan can sometimes be faster than index access. Adding indexes also makes DML slightly slower because the indexes need to be kept up-to-date. Are you having performance problems that you're trying to solve? – Mick Mnemonic Sep 22 '19 at 11:32

3 Answers3

1

Indexing is a frequently effective measure to optimize queries, however, you need to do further steps. Text-based searches tend to be slower in general, so it's highly advisable to modify your class table, so it will have a numeric primary key and to avoid storing texts such as SHIRAZ in your wine table, but rather a numeric foreign key to the class table and store the text SHIRAZ exactly once, for its class record, which would be referenced from the wine table via a numeric value. Also, you should do similarly for grade. If you do not have a grade table yet, create one, with numeric primary key and a field to store values, like A.

Finally, your query is calculating a Descartes multiplication, which, as we know from set theory, does a match for each coordinates in the problem topology. Also, as we know from relational algebra, your where clause will run for all points in your three dimensional (vineyard, class, wine) problem-space. As far as I know, if you would refactor your query to use joins, it should become much quicker, because there are some optimizations for joins to avoid computing all points in your topology.

Let's refactor your current query:

Select vintage, wine_no, wname, pctalc, grade, price, wine.vid, vname, wine.cid, cname
from wine
join class
on wine.cid = class.cid and wine.cid = 'SHIRAZ' and wine.grade = 'A'
join vineyard
on wine.vid = vineyard.vid;

Let's refactor this query to be compatible with your schema after the structural changes I have suggested:

Select vintage, wine_no, wname, pctalc, grade, price, wine.vid, vname, wine.cid, cname
from wine
join class
on wine.class_id = class.id and class.cid = 'SHIRAZ'
join grade
on wine.grade_id = grade.id and grade.value = 'A'
join vineyard
on wine.vid = vineyard.vid;

Also, since it's safe to assume that there are only a few grades and vineyards, we can change the order of introducing the tables into the query:

Select vintage, wine_no, wname, pctalc, grade, price, wine.vid, vname, wine.cid, cname
from grade
join wine
on wine.grade_id = grade.id and grade.value = 'A'
join class
on wine.class_id = class.id and class.cid = 'SHIRAZ'
join vineyard
on wine.vid = vineyard.vid;

There are further steps to do if this is still not enough, let me know if further steps are needed.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • I don't understand that "Descartes multiplication" thing. OP is selecting wines, each with their class and vineyard. How does your refactored query change this? You are merely using the current syntax to express the same thing. That shouldn't lead to any difference in the execution plan. And `and wine.cid = 'SHIRAZ' and wine.grade = 'A'` don't belong in the join clause for `join class` of course, because they are referring to the `wine` table. Either put them in the `WHERE` clause or switch `wine` and `class` in your `FROM` clause. – Thorsten Kettner Sep 22 '19 at 11:00
  • There"s plenty of "two-dollar words" in this answer. – APC Sep 22 '19 at 11:05
  • @ThorstenKettner optimization is the act of changing something to something else, identical in nature, but better performing, so I don't view the equivalency in the old and new logic as a problem. Also, you can put those comparisons into the on and the point is to force the engine to execute them when the number of dimensions is smaller, rather than at the end. Please read about Descartes or Cartesian multiplication/product here: http://www.jonathancrabtree.com/mathematics/what-is-descartesian-multiplication/ – Lajos Arpad Sep 22 '19 at 11:16
  • @ThorstenKettner also, it is good to react to your criticism about query plans not changing: I'm sure that the query plan will significantly change if the schema changes as proposed, would be very very surprised if the query plan of a query which uses some columns and tables would be the same with the query plan of a query which was written for a schema where those do not exist. – Lajos Arpad Sep 22 '19 at 11:18
  • I think you are making some false estimations here. **(1)** Do you think Oracle will build a cartesian product of all rows of all tables, before applying the `WHERE` clause? That's wrong. It will join the tables using the `WHERE` clause. Explicit joins are easier to read and show the join criteria in their `ON` clause, but to Oracle it doesn't matter really whether you specify criteria for inner joins in `ON` or `WHERE`. – Thorsten Kettner Sep 22 '19 at 11:47
  • **(2)** What makes you think that numeric keys are so much faster than text keys? Oracle stores numerics in variable multiple bytes. There shouldn't be that much of a difference there. Especially not in binary tree indexes. **(3)** You think by creating a table that contains an attribute we must look up, you speed up the query? How can that be? Before, the wine table contained the class attribute directly, so it could be used in a composite index to get to the desired wine rows immediately. – Thorsten Kettner Sep 22 '19 at 11:47
  • @ThorstenKettner "Do you think Oracle will build a cartesian product of all rows of all tables, before applying the WHERE clause?" I do not assume it. On the contrary, I'm not assuming it will not do it. It's called defensive programming. "What makes you think that numeric keys are so much faster than text keys" if under the hood indexes of non-numeric keys are represented the very same way as in the case of numeric keys, even then there is a difference if the table is very dynamic, like there are a lot of writes occurring. – Lajos Arpad Sep 22 '19 at 13:56
  • @ThorstenKettner "You think by creating a table that contains an attribute we must look up, you speed up the query?". I do. I did not implement Oracle, so I do not know whether under the hood indexes are traversable in the exact same speed if they are numbers or texts, however, a table having 5 or 6 records, having grade, loaded and filtered at the start of the query or close to its start will not make things slow. So, if writing rows where textual data is indexed is slower and hence slows down reads, then we have a small cost and a large gain. – Lajos Arpad Sep 22 '19 at 13:59
1

To start with: you are using an old join syntax (of the 1980s actually). This is how we would write the query today:

Select
   w.vintage, w.wine_no, w.wname, w.pctalc, w.grade, w.price, w.vid, v.vname, w.cid, c.cname
from wine w
join vineyard v on v.vid = w.vid
join class c on c.cid = w.cid
where w.cid = 'SHIRAZ' 
and w.grade = 'A';

Here a mere glimpse at the WHERE clause suffices to see that you are looking for wines matching a class and grade. So, have an index on the two columns. Order may matter, so provide two indexes. Extend this with the vineyard ID, so as to get quickly to the vineyard table.

As to class and vineyard, you should already have indexes on their IDs. You may want to add the one column for each table you are selecting, so the DBMS can get the values directly from the index.

create index idx01 on wine ( cid, grade, vid );
create index idx02 on wine ( grade, cid, vid );
create index idx03 on class ( cid, cname );
create index idx04 on vineyard ( vid, vname );

Use the execution plan to detect indexes that are not used (the query will only use either idx01 or idx02 or even none of these, not both) and drop them.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
0

This is your query:

select w.vintage, w.wine_no, w.wname, w.pctalc, w.grade,
       w.price, w.vid, v.vname, w.cid, c.cname
from wine w join
     vineyard v
     on v.vid = w.vid join
     class c
     on c.cid = w.cid
where w.cid = 'SHIRAZ' and
      w.grade = 'A';

All the joins are inner joins, all the filtering is on one table, and the conditions are equality conditions. Hence, that table should be the first one accessed. You then want joins on the filtering conditions and the corresponding join keys of the other other tables:

  • wine(cid, grade, vid) (the first two keys can be in either order).

The join keys of the other tables vineyard(vid) and class(cid) are already indexed, because these are declared are primary keys. Hence, the above is the only additional index you need.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786