0

I'm troubleshooting a very slow index on mariaDB. It takes over 10 seconds. The table has over 10M rows. The where clause has 'where a=x and b=y'. Column b has a selective index. Column a has a non selective index (only 5 distinct values), but it's can't be deleted since a is a foreign key. mariaDB is using intersect of the 2 indices, which make it perform a lot worse than using index on b alone. I don't know how to solve this given that:

  • I can't delete the index on a since I want to ensure integrity
  • I don't want to cancel the intersect index optimization for the entire database
  • I don't want to use index hints since it's not standard (even though I'm not sure what other option I got).

Any ideas?

  • We need things like the output of `SHOW CREATE TABLE` for the relevant tables. Maybe your computer's on fire and is CPU throttled, or maybe it's using 5400 RPM drives, we have no idea what might be going on here. "It's slow" is an observation, not a diagnostic we can work with. – tadman May 01 '18 at 20:04
  • 1
    Thanks tadman, I'm adding some outputs below. However the question is general: a foreing key automatically adds an index, this index can be non selective (which is often the case for parent table), and then the optimizer is using intersect which harm performance if one of the indices is non selective. – Itamar Tayer May 01 '18 at 20:10
  • 1
    It's worth editing the question and adding that content with the proper formatting. It doesn't show up well in comments. – tadman May 01 '18 at 20:17
  • Have you already tried compound indexes on the tables? – sticky bit May 01 '18 at 22:36

1 Answers1

0

a foreing key automatically adds an index...

Well... in MariaDB, yes. This is not a standard feature by any means; other databases (Oracle, DB2, PostgreSQL, etc.) do not automatically create indexes to enforce foreign keys.

I fail to fully understand your question but it seems to me you have two indexes, one using column a and another using column b. And... you are trying to decide which one to use or how to combine them.

It's not difficult to make the query you mention to use an index. If the WHERE condition is where a=x and b=y, this is a perfect case to be heavily accelerated by creating the index (b, a), as in:

create index ux_mytable_b_a on my_table (b, a);

Note I used (b, a) and not (a, b). This way the most selective column is placed first, and the query runs less chances of a hash collision and/or a bucket overflow.

Now, regarding your observation:

mariaDB is using intersect of the 2 indices, which make it perform a lot worse...

Please note that having a combined third index (with two columns) avoids an index intersection. MariaDB will automatically choose the new index since it's faster than using any combination of the previous ones.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • Thanks. Indeed I ended up adding a combined index. Just to clarify the situation: in mysql and mariadb foreign keys creates indexes automatically. such indexes are often non selective. however there is no way to remove them (and keep the foreign key) or to disable them as possible keys to the optimizer. This cause the optimizer to do intersection with non selective indexes, which perform poorly comparing to using a single selective index. – Itamar Tayer May 02 '18 at 09:34
  • MariaDB does not use hashing or buckets. `(b,a)` and `(a,b)` are equally good, regardless of the _individual_ cardinalities. – Rick James May 10 '18 at 04:27
  • @ItamarTayer - If you already have `INDEX(a,b)` and add `FOREIGN KEY ... REFERENCES ... a ...`, it _should_ say that the composite index will suffice. – Rick James May 10 '18 at 04:29
  • @RickJames Point taken. MariaDB does not use hash tables. Nevertheless `b,a` is faster than `a,b` since `b` is more selective than `a`. That is, filtering by `b` values first heavily reduces the candidate rows, compared to filtering by `a` first. – The Impaler May 10 '18 at 14:45
  • @TheImpaler - But it is not filtering on one, then the other. It is filtering on the combination by using a _single_ drill-down in a BTree. It's equivalent to concatenating the two columns, then having a single index on that combined column. – Rick James May 10 '18 at 15:10
  • @TheImpaler - If you want to debate that point, here is a Question on such: https://stackoverflow.com/questions/12315496/which-column-to-put-first-in-index-higher-or-lower-cardinality – Rick James May 10 '18 at 15:15
  • @RickJames You are right. For some reason, I always tend to consider range scans when querying, but this was just an equality seek. I guess my mind gravitates towards the worst case scenario. – The Impaler May 10 '18 at 16:50