74

I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].

I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.

I'll use a example to describe my problem.

We have a table that records data for high scores in NBA games, with index on column [happend_in]

CREATE  TABLE `highscores` (
   `tbl_id` int(11) NOT NULL auto_increment,
   `happened_in` int(4) default NULL,
   `player` int(3) default NULL,
   `score` int(3) default NULL,
   PRIMARY KEY  (`tbl_id`),
   KEY `index_happened_in` (`happened_in`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert data(8 rows)

INSERT INTO highscores(happened_in, player, score)
VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81),
(1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);

then I create a view to see the highest score that Kobe Bryant got in each year

CREATE OR REPLACE VIEW v_kobe_highScores
AS
   SELECT player, max(score) AS highest_score, happened_in
   FROM highscores
   WHERE player = 24
   GROUP BY happened_in;

I wrote a conditional statement to see the highest score that kobe got in 2006;

select * from v_kobe_highscores where happened_in = 2006;

When I explain it in toad for mysql, I found out that mysql have scan all rows to form the view, then find data with condition in it, without using index on [happened_in].

explain select * from v_kobe_highscores where happened_in = 2006;

explain result

The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!

@zerkms Here is the result I tested on real-life. I don't see much differences between. I think @spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query. real-life test

gotgenes
  • 38,661
  • 28
  • 100
  • 128
Roger Ray
  • 1,251
  • 2
  • 12
  • 20

3 Answers3

63

How do we get MySQL (or MariaDB) to use an index for a view query? The short answer, we provide an index that MySQL can use.

In this case, the optimum index is likely a "covering" index: Likely, the optimum index will turn out to be a "covering index" e.g.

CREATE INDEX highscores_IX3 ON highscores (player, happened_in, score)

and we expect MySQL optimizer will be able to use that index for the SELECT. We test just the SELECT first (without the CREATE VIEW) and EXPLAIN will show: "Using index" due to the WHERE player = 24 (in database lingo, that's an equality predicate on the leading column in the index. The GROUP BY happened_id (the second column in the index), may allow MySQL to avoid a (potentially expensive) sort operation, allowing the rows to be returned in order from the index.

Including the score column in the index will allow the query to satisfied entirely from the index pages, without having to visit (lookup) the data pages referenced by the index.

That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_id for the view query.


Why the view causes a performance issue

One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.

Your outer query specifies WHERE happened_in = 2006. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)

The bottom line is that the index you have defined on happened_in is not being used by MySQL when it rusn the query that forms the view definition.

After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006 predicate is evaluated.

Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in, not just the one you specify an equality predicate on in the outer query.

The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.


Improving performance of the view query with a suitable covering index

Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.

... ON highscores (player, happened_in, score).

That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The player column is the leading column because you have an equality predicate on that column in the view query. The happened_in column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the score column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".


Compare performance to standalone query with no derived table

You could compare the execution plan for your query against the view vs. an equivalent standalone query:

SELECT player
     , MAX(score) AS highest_score
     , happened_in
 FROM highscores
WHERE player = 24
  AND happened_in = 2006
GROUP
   BY player
    , happened_in

The standalone query can also make use of a covering index e.g.

... ON highscores (player, happened_in, score)

but without a need to materialize an intermediate MyISAM table.


I am not sure that any of the previous provides a direct answer to the question you were asking.

Q: How do I get MySQL to use an INDEX for view query?

A: Define a suitable INDEX that the view query can use.

The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column player would be a leading column because you have a player = 24 predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the GROUP BY operation, by making use of the index rather than using a sort operation.

The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.

Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • 1
    If MySQL doesn't has a way to push the predicate from the outer query into the view query, it look like instead create a view execute specific query in particular circumstances is the best way to do it. – Roger Ray Dec 19 '12 at 03:47
  • "MySQL doesn't push the predicate from the outer query into the view query" --- if it doesn't how would you explain the second line in the `EXPLAIN`? – zerkms Dec 19 '12 at 03:59
  • @zerkms: I explain the use of the index on the second line as a result of the optimization for the GROUP BY. Note that the query processed 8 rows, which is ALL of the rows in the table, and not all rows satisfy the `happened_in = 2006` predicate. I explain the "Using where" on that second line as referring to the `player = 24` predicate in the view query, not the predicate from the outer query. Note that the first line in the explain plan shows 'Using where', that's in reference to the `happened_in = 2006` predicate. (My understanding here may be incomplete, I avoid using views in MySQL.) – spencer7593 Dec 19 '12 at 04:58
  • @zerkms: Also see that row count on the first line of the explain? That shows 3 rows; if you look at the view query, you will see that there are three distinct values of `happened_in` for `player = 24`. IF the `happened_in = 2006` predicate had been pushed into the view query, then there would be only one row in the derived table, and we would expect explain to report a row count of 1 from it. – spencer7593 Dec 19 '12 at 05:03
  • "and we would expect explain to report a row count of 1 from it" --- explain doesn't show accurate number of rows scanned, it's a *prediction*. But seems like you're right. Posgresql and oracle do the predicate push for view query, didn't expect mysql optimizer to be so silly. – zerkms Dec 19 '12 at 05:49
  • @zerkms: but note that MySQL EXPLAIN _does_ materialize the derived tables (inline views) before it predicts the execution plan for the outermost query. Yes, Oracle does a decent job of pushing predicates into view queries. But the MySQL optimizer does not, which is a big reason we generally avoid using views in MySQL. – spencer7593 Dec 19 '12 at 05:57
  • @zerkms: an interesting test would be to include that predicate from the outer query, to manually "push" that into the view definition, by including it in the view query. Then we could take a look at the EXPLAIN output, and I strongly suspect that the rowcount from the derived table would be "predicted" to be a 1 vs. 3. That test result would give further credence to the idea that MySQL does not "push" predicates from outer queries into a view query (either stored or inline). – spencer7593 Dec 19 '12 at 15:33
  • 1
    Really appreciate for spending your time to give me such a specific answer. It's really helpful! I think MySQL should change the way of processing view queries. If we have to use an equivalent standalone query instead of a conditional view query in some situations, does it means that the view part of MySQL architectural was not well designed? – Roger Ray Dec 23 '12 at 04:40
  • 1
    No, I don't think it means it was not well designed. MySQL just has different design goals than other DBMSs. MySQL supports "inline views" in the same way it does "stored views". MySQL always materializes that resultset from a view query (whether inline or stored) as an intermediate MyISAM table. I think it's "designed" just fine. The problem is in not understanding that design, and expecting MySQL to work differently. So, we can use views in MySQL, but carefully, not blindly without understanding how views work.) – spencer7593 Dec 23 '12 at 04:52
  • 1
    [Cookbook](http://mysql.rjweb.org/doc.php/index_cookbook_mysql) for building indexes from SELECTs. – Rick James Nov 26 '15 at 07:09
  • 1
    According to the MySQL documentation, it sounds like the optimizer will indeed "push" the predicate down to the view but only when the view is being used with ALGORITHM=MERGE, which can only be used in very specific situations. If the view contains GROUP BY, aggregate functions, LIMIT, etc. the view will be materialized instead. (So pretty, much anything that a view would be useful for.) See the bottom of this page here: https://dev.mysql.com/doc/refman/5.6/en/view-algorithms.html – Code Commander Jul 12 '19 at 17:24
  • According to the MySQL documentation: `MERGE` cannot be used if the view contains any of the following constructs: Aggregate functions ( ..., `MAX()`, (The answer could have been more explicit as to the reason why MySQL isn't pushing the predicate into OP view) – spencer7593 Jul 12 '19 at 18:46
3

Creating the composite index with player + happened_in (in this particular order) columns is the best you can do in this case.

PS: don't test mysql optimizer behaviour on such small amount of rows, because it's likely to prefer fullscan over indexes. If you want to see what will happen in real life - fill it with real life-alike amount of data.

zerkms
  • 249,484
  • 69
  • 436
  • 539
  • The best result I want to get is that retrieve data in view without scan all the rows in table [highscores]. Using index like select conditional search. composite index with player + happened_in can only help scan all the rows in table, which I'm trying to avoid. Is there another to create a view so that index of table can be use in data retrieval upon view? – Roger Ray Dec 19 '12 at 03:23
  • @Roger Ray: have you actually **tried** what I suggested? "can only help scan all the rows in table, which I'm trying to avoid" --- that's not entirely correct. It will help to *avoid* the table scan – zerkms Dec 19 '12 at 03:57
  • Of course I've **tried** what you suggested. Instead of scan all 8 rows, with this composite index, it scanned 6 rows, then query with the condition of [happened_in = 2006]. It did save some time, But the table I'm dealing with now has over 1M rows. Scanning from them without the condition of [happened_in = 2006] is still too much cost. Plus, creating a composite index in a table with large amount of data probably isn't a very good idea. Thanks for your suggestion though. – Roger Ray Dec 19 '12 at 05:39
  • @Roger Ray: show the explain with new index for the 1M rows table. "Plus, creating a composite index in a table with large amount of data probably isn't a very good idea" --- who told you so? You need as many indexes as you need. And as I said in my answer - always check how optimizer behaves on real life-alike amount of data. The behaviour on 1M and 8 records may differ much. – zerkms Dec 19 '12 at 05:44
  • I've put a image of test result in real-life at the bottom of my question if you want to take a look~ – Roger Ray Dec 19 '12 at 06:08
  • @Roger Ray: have you created the composite index? I see the same key length used :-S – zerkms Dec 19 '12 at 06:11
  • @Roger Ray: now the length value is too big :-S – zerkms Dec 19 '12 at 07:30
  • That's why "creating a composite index" under this circumstances is not such a good idea. – Roger Ray Dec 19 '12 at 08:08
  • @Roger Ray: "it depends". Composite index of 2 integers is not a bad thing. – zerkms Dec 19 '12 at 09:09
  • 2
    Actually, a "covering" index `ON (player_id, happened_in, score)` would be an even better index for the view query; that would allow the `GROUP BY` operation to be optimized, avoiding a sort operation, as well allowing the query to be satisfied entirely from the index pages, without need to visit any data pages. – spencer7593 Dec 20 '12 at 21:20
1

This doesn't directly answer the question, but it is a directly related workaround for others running into this issue. This achieves the same benefits of using a view, while minimizing the disadvantages.

I setup a PHP function to which I can send parameters, things to push into the inside to maximize index usage, rather than using them in a join or where clause outside a view. In the function you can formulate the SQL syntax for a derived table, and return that syntax. Then in the calling program, you can do something like this:

$table = tablesyntax(parameters);
select field1, field2 from {$table} as x... + other SQL

Thus you get the encapsulation benefits of the view, the ability to call it as if it is a view, but not the index limitations.

ray.gurganus
  • 119
  • 1
  • 2
  • 10