14

I have a very large (80+ million row) de-normalized MySQL table. A simplified schema looks like:

+-----------+-------------+--------------+--------------+
|    ID     |   PARAM1    |   PARAM2     |   PARAM3     |
+-----------+-------------+--------------+--------------+
|    1      |   .04       |    .87       |    .78       |
+-----------+-------------+--------------+--------------+
|    2      |   .12       |    .02       |    .76       |
+-----------+-------------+--------------+--------------+
|    3      |   .24       |    .92       |    .23       |
+-----------+-------------+--------------+--------------+
|    4      |   .65       |    .12       |    .01       |
+-----------+-------------+--------------+--------------+
|    5      |   .98       |    .45       |    .65       |
+-----------+-------------+--------------+--------------+

I'm trying to see if there's a way to optimize a query in which I apply a weight to each PARAM column (where weight is between 0 and 1) and then average them to come up with a computed value SCORE. Then I want to ORDER BY that computed SCORE column.

For example, assuming the weighting for PARAM1 is .5, the weighting for PARAM2 is .23 and the weighting for PARAM3 is .76, you would end up with something similar to:

SELECT ID, ((PARAM1 * .5) + (PARAM2 * .23) + (PARAM3 * .76)) / 3 AS SCORE 

ORDER BY SCORE DESC LIMIT 10

With some proper indexing, this is fast for basic queries, but I can't figure out a good way to speed up the above query on such a large table.

Details:

  • Each PARAM value is between 0 and 1
  • Each weight applied to the PARAMS are between 0 and 1 s

--EDIT--

A simplified version of the problem follows.

This runs in a reasonable amount of time:

SELECT value1, value2 
FROM sometable 
WHERE id = 1 
ORDER BY value2

This does not run in a reasonable amount of time:

 SELECT value1, (value2 * an_arbitrary_float) as value3 
 FROM sometable 
 WHERE id = 1 
 ORDER BY value3

Using the above example, is there any solution that allows me to do an ORDER BY with out computing value3 ahead of time?

cbrumelle
  • 171
  • 1
  • 7
  • You can format code with the "010001" button. `` tags are not recognised. I've done it for you. – Álvaro González Aug 03 '10 at 18:43
  • 1
    What indexes do you currently have? What does the EXPLAIN have to say about it? I realise you may not be able to get these things for your cut-down version, but it would be helpful if possible. – Brian Hooper Aug 03 '10 at 18:47
  • I have a couple of indexes on this table which are really being used for other queries. ID has an index, and then there are indexes for (ID, PARAM1) and (ID, PARAM2) that let me run queries where I can grab the top 10 rows for a given ID, ordered by PARAM1. From looking at EXPLAIN, the problem is the filesort that occurs when 'ORDER BY' is being used on the (unindexed) column that is being calculated on the fly. I'm not sure there is a solution for this. – cbrumelle Aug 03 '10 at 20:58
  • +1 good, well-written question. Welcome to SO – Jim Garrison Aug 04 '10 at 02:29
  • Do you expect this table to grow even bigger in the future? What storage engine is it? Can you give some specs of your mysql server hardware? – ceteras Aug 04 '10 at 11:29
  • The table will grow bigger over time, but not crazily so. I'm using MyISAM, which I could change if there is good reason. The server has 4 gigs of RAM, lots of disk space, is reasonably fast quad core, and is only running MySQL and a handful of services. – cbrumelle Aug 04 '10 at 17:13
  • check out this solution here, which is admittedly pretty involved: http://dba.stackexchange.com/questions/11841/how-can-i-speed-up-a-query-that-orders-by-a-calculated-field – allenwlee Oct 18 '14 at 18:23

3 Answers3

3

I've found 2 (sort of obvious) things that have helped speed this query up to a satisfactory level:

  1. Minimize the number of rows that need to be sorted. By using an index on the 'id' field and a subselect to trim the number of records first, the file sort on the computed column is not that bad. Ie:

    SELECT t.value1, (t.value2 * an_arbitrary_float) as SCORE
    FROM (SELECT * FROM sometable WHERE id = 1) AS t 
    ORDER BY SCORE DESC
    
  2. Try increasing sort_buffer_size in my.conf to speed up those filesorts.

cbrumelle
  • 171
  • 1
  • 7
2

I know this question is old, but I recently ran into this problem, and the solution I came up with was to use a derived table. In the derived table, create your calculated column. In the outer query, you can order by it. It seems to run considerably faster for my workload (orders of magnitude).

SELECT value1, value3
FROM (
    SELECT value1, (value2 * an_arbitrary_float) as value3 
    FROM sometable 
    WHERE id = 1 
) AS calculated
ORDER BY value3
siride
  • 200,666
  • 4
  • 41
  • 62
0

MySQL lacks many sexy features that could help you with this. Perhaps you could add a column with the calculated ranking, index it and write a couple of triggers to keep it updated.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • The problem is the weighting that is used to calculate SCORE, is based on user input - it's unknown until runtime. As such, there is no way to (easily) calculate the scores in advance. One possible solution could be to change the weighting of PARAMS from a float to a known set of values (0, .2, .4, .6, .8) but then storage requirements of those computed values would be massive. – cbrumelle Aug 03 '10 at 20:31