3

I have a SQL query which gives the correct result, but performs too slow.

The query operates on the following three tables:

  • customers contains lots of customer data like name, address, phone etc. To simplify the table i am only using the name.

  • customdatas contains certain custom (not customer) data. (The tables are created in software, which is why the plural form is wrong for this table)

  • customercustomdatarels associates custom data with a customer.

customers

Id                                  Name            (many more columns)
-----------------------------------------------------------------------
8053c6f4c5c5c631054ddb13d9186117    MyCustomer      ...
2efd2aa5711ddfade1f829b12dd88cf3    CheeseFactory   ...

customdata

id                                  key
-------------------------------------------------
22deb172c1af6e8e245634a751871564    favoritsport
86eea84d296df9309ad6ff36fd7f856e    favoritcheese

customercustomdatarels (relation between customer and custom data - with corresponding value)

customer                            customdata                          value
-------------------------------------------------------------------------------------
8053c6f4c5c5c631054ddb13d9186117    22deb172c1af6e8e245634a751871564    cycling
8053c6f4c5c5c631054ddb13d9186117    86eea84d296df9309ad6ff36fd7f856e    cheddar
2efd2aa5711ddfade1f829b12dd88cf3    22deb172c1af6e8e245634a751871564    football
2efd2aa5711ddfade1f829b12dd88cf3    86eea84d296df9309ad6ff36fd7f856e    mouldy

What i want is a table basically consisting of all data in customers with an variable amount of extra columns, corresponding to the custom data specified in customercustomdatarels. These columns should be defined somewhere and I have therefore created the following table which defines such extra columns and maps them to a key in the customdata table:

test_customkeymapping

colkey  customkey
---------------------
1       favoritsport
2       favoritcheese

The result should then be:

Name            ExtraColumn_1   ExtraColumn_2
---------------------------------------------
CheeseFactory   football        mouldy
MyCustomer      cycling         cheddar

(ExtraColumn_1 is therefore synonym for a customers' favorite sport and ExtraColumn_2 is a synonym for a customers' favorit cheese.)

This result is achieved by executing the following query:

SET @sql = NULL;

SELECT
    GROUP_CONCAT(DISTINCT
        CONCAT('MAX(CASE
                WHEN ckm.colkey = ', colkey, ' THEN
                    (SELECT value FROM customercustomdatarels ccdr2
                     LEFT JOIN customdatas cd2
                       ON cd2.id = ccdr2.customdata
                     WHERE cd2.key = ckm.customkey AND c.Id = ccdr2.customer)
                END) AS ', CONCAT('`ExtraColumn_', colkey, '`'))
    ) INTO @sql
FROM test_customkeymapping;

SET @sql = CONCAT('SELECT c.Name, ', @sql, ' 
                   FROM customers c
                   LEFT JOIN customercustomdatarels ccdr
                     ON c.Id = ccdr.customer
                   LEFT JOIN customdatas cd
                     ON cd.Id = ccdr.customdata
                   LEFT JOIN test_customkeymapping ckm 
                     ON cd.key = ckm.customkey
                   GROUP BY c.Id');

PREPARE stmt FROM @sql;
EXECUTE stmt;

This works. But is too slow (for 7000 customers it takes ~10 seconds). The query was greatly influenced by the solution in this question: MySQL Join Multiple Rows as Columns

How do I optimize this query?

Community
  • 1
  • 1
CleanUp
  • 410
  • 4
  • 12
  • Seems like a beginner mistake to want something like this. Usually you should simply accept that the number of columns is static. If you want a dynamic amount of columns the query has to be generated dynamically and this is indeed terrible for performance. – wvdz Jul 02 '14 at 12:49
  • It's also very hard, if not impossible, to think of a reasonable scenario in which this is useful. It seems you are trying to do something in your database which you should do in your application. – wvdz Jul 02 '14 at 12:52

1 Answers1

4

I don't understand why you are using a subquery in the group_concat() statement. Wouldn't this generate the code that you really want to run?

SET @sql = NULL;

SELECT
    GROUP_CONCAT(DISTINCT
        CONCAT('MAX(CASE WHEN ckm.colkey = ', colkey, ' THEN ccd.value END) AS ',  
               CONCAT('ExtraColumn_', colkey, ''))
    ) INTO @sql
FROM test_customkeymapping;

SET @sql = CONCAT('SELECT c.Name, ', @sql, ' 
                   FROM customers c
                   LEFT JOIN customercustomdatarels ccdr
                     ON c.Id = ccdr.customer
                   LEFT JOIN customdatas cd
                     ON cd.Id = ccdr.customdata
                   LEFT JOIN test_customkeymapping ckm 
                     ON cd.key = ckm.customkey
                   GROUP BY c.Id');

PREPARE stmt FROM @sql;
EXECUTE stmt;

Note: This is untested, but the idea is the same. Use the values from the main from statement for your work rather than the values from some extra, unnecessary subquery.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks. That certainly simplifies the query. I don't know why i couldn't see the sub query was unnecessary at that moment. The execution time is almost identical however. – CleanUp Jul 02 '14 at 13:22
  • @CleanUp . . . Do you have indexes on the `join` keys? It is also possible that your data is sufficiently large that the `group by` takes time. In your case, that would be due to wide columns rather than numbers of rows. – Gordon Linoff Jul 02 '14 at 13:27
  • There is a lot of data, but the key issue was some missing indexes for some of the keys as you pointed out. I think it will perform sufficiently - for now at least. Thank you very much. – CleanUp Jul 02 '14 at 13:39