2

My question looks similar to this one: How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?, but I've some specific issues.

Let's assume that we have such table:

| title_id |  title   | type_id | crit_id |
+----------+----------+---------+---------+
|    1     | title_A1 |     0   |   1111  |
|    2     | title_A2 |     1   |   1111  |
|    3     | title_B1 |    50   |   2222  |  
|    4     | title_B2 |    50   |   2222  |
|    5     | title_C1 |    72   |   3333  |
|    6     | title_C2 |     1   |   3333  |
|    7     | title_C4 |     0   |   3333  |

"title_id" is unique and indexed, "title" and "crit_id" are indexed.

As a result I'd like to have only rows grouped by "crit_id", holding the minimum value of custom rank (priority) that is calculated on type_id. For e.q. rank for type_id looks like:

type_id = 0 - rank = 10
type_id = 50 -rank = 11
type_id = 1 - rank = 15
type_id = 72- rank = 35
etc...

And at the end everything should be ordered alphabetically by "title" According to requirements results should be:

| title_id |  title   | type_id | crit_id | rank |
+----------+----------+---------+---------+------+
|     1    | title_A1 |    0    |  1111   |  10  | 
|     3    | title_B1 |   50    |  2222   |  11  |
|     7    | title_C4 |    0    |  3333   |  10  |

I'm using SQLITE. I can get required results with query:

SELECT *, MIN(CASE WHEN type_id = 0 THEN 10
                   WHEN type_id = 1 THEN 11
                   WHEN type_id = 50 THEN 15
                   WHEN type_id = 72 THEN 35
                   ELSE 1000 END) as rank
FROM titles WHERE ... GROUP BY crit_id ORDER BY title

This query has really bad performance. On a 1 000 000 records it executes more than 10 seconds.

Here are couple problems:

  1. We have about 60% of records with type_id == 0. And in this case we are executing MIN and CASE clauses about 600 000 times. As rank is calculated we can't use index here. I'd like some how minimize execution of it.
  2. Using GROUP BY on such amount of data brings really bad performance. And after reading Selecting records holding group-wise maximum (on a unique column), I'm not sure if it always have correct behaviour. Hope that there is another way that does similar thing as Group By.

PS: I'm running this on embeded device, with slow memory card, thus access to DB is so slow.

I'm not SQL expert, so if there exist any solution I'd appreciate it. Thanks in advance.

Forgot to mention, we can LIMIT amount of results that should be returned. For e.q. LIMIT 500.

Community
  • 1
  • 1
vol_ya
  • 21
  • 2
  • 3
    `select *` with `group by` almost never does what the writer intends. – Gordon Linoff May 12 '16 at 22:22
  • is the typeid mapping to rank in a table? – guido May 12 '16 at 22:24
  • Try to things: (1) use the other `CASE` syntax where you first specify the value to test and then have the different `WHEN` clauses, and then (2) move the `MIN` function inside of the `CASE`, so to get: `CASE MIN(type_id) WHEN 0 THEN 10 WHEN ...` – trincot May 12 '16 at 22:28
  • What should `title_id` and other columns give when there are different values for the same `crit_id` value? In the SQL standard (applied in MySql 5.7 by default) you need to either aggregate `title_id` somehow (e.g. `min(title_id)`), or also group by it. Same for other columns that are not functionally determined by `crit_id`. – trincot May 12 '16 at 22:34
  • The `...` in your `WHERE` clause may be an important part of your performance measurement. If you are selecting a relatively small number of rows out of the entire table then an index on one or more of the filter expressions may give you the biggest performance improvement. – D Stanley May 12 '16 at 22:42
  • 1
    Also, querying 1MM records on an embedded device from a memory card in 10 seconds doesn't sound "really bad". If I/O or CPU is your bottleneck then changing the query may not help much at all. – D Stanley May 12 '16 at 22:45
  • Show the database schema and the output of [EXPLAIN QUERY PLAN](http://www.sqlite.org/eqp.html). – CL. May 13 '16 at 07:47
  • > is the typeid mapping to rank in a table? Nope, this is logic in application that I'm trying to move on SQL level. I can create only a "view" that will map type_id to rank. – vol_ya May 13 '16 at 07:54
  • About `WHERE` clause, you are right it is much faster if we provide it, but unfortunately I described the worst case. And the requirement is to be under 500ms at least, no matters if we have something in `WHERE` or not. – vol_ya May 13 '16 at 08:00
  • EXPLAIN QUERY PLAN looks like this: SCAN TABLE TITLES USING INDEX IDX_CRIT_ID; EXECUTE LIST SUBQUERY 1; USE TEMP B-TREE FOR ORDER BY; – vol_ya May 13 '16 at 08:06

1 Answers1

0

Perhaps you can take advantage of a composite index on (type_id, crit_id) by splitting your queries with union all:

select crit_id, min(rank) from (
   select distinct crit_id, 10 as rank
   from titles where type_id = 0
   union all
   select distinct crit_id, 11
   from titles where type_id = 1
   union all
   select distinct crit_id, 15
   from titles where type_id = 50
   union all
   select distinct crit_id, 35
   from titles where type_id = 72
   union all
   select distinct crit_id, 1000
   from titles where type_id not in (0,1,50,72)
) t group by crit_id
FuzzyTree
  • 32,014
  • 3
  • 54
  • 85