0

This query:

SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
        GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`email_address` , `subscribers`.`first_name` , `subscribers`.`last_name`

With execute plan:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  subscribers     ref     FK_list_id,state_date_added     FK_list_id  4   const   1753610     Using where; Using filesort
1   SIMPLE  t1  ref     subscriber_fk,field_fk  subscriber_fk   4   chad0598_mailablel.subscribers.subscriber_id    1    
1   SIMPLE  t2  ref     subscriber_fk,field_fk  subscriber_fk   4   chad0598_mailablel.subscribers.subscriber_id    1    

Caused error for uknown reason: General error: 3 Error writing file '/tmp/MYzamaNT' (Errcode: 28) in mysql.

I rewrote it like this groupy by subscriber_id insead:

SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
        GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS 'Colors', GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS 'Languages'
FROM `subscribers`
LEFT JOIN `subscribers_multivalued` AS `t1` ON subscribers.subscriber_id = t1.subscriber_id AND t1.field_id = 112
LEFT JOIN `subscribers_multivalued` AS `t2` ON subscribers.subscriber_id = t2.subscriber_id AND t2.field_id = 111
WHERE (list_id = 40) AND (state = 1)
GROUP BY `subscribers`.`subscriber_id`

The plan of this query is better (not filesort):

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  subscribers     ref     FK_list_id,state_date_added     FK_list_id  4   const   1753610     Using where
1   SIMPLE  t1  ref     subscriber_fk,field_fk  subscriber_fk   4   chad0598_mailablel.subscribers.subscriber_id    1    
1   SIMPLE  t2  ref     subscriber_fk,field_fk  subscriber_fk   4   chad0598_mailablel.subscribers.subscriber_id    1   

It seems that it works in mysql and it much faster that previous query. Although it works in mysql it's not compliant sql standard there shouldn't be aggregated fields in fields' list that are not present in GROUP BY. I there a way to make the query as fast as the last one and make it sql compliant?

I tried to rewrite it this way too:

SELECT `subscribers`.`email_address`, `subscribers`.`first_name`, `subscribers`.`last_name`,
        t1.colors AS 'Colors', t2.languages AS 'Languages'
FROM `subscribers`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'colors'
           FROM subscribers_multivalued
           WHERE field_id = 112
           GROUP BY subscriber_id)  t1 ON t1.subscriber_id = `subscribers`.`subscriber_id`
LEFT JOIN (SELECT subscriber_id, GROUP_CONCAT( DISTINCT value SEPARATOR '|' ) AS 'languages'
           FROM subscribers_multivalued
           WHERE field_id = 111
           GROUP BY subscriber_id)  t2 ON t2.subscriber_id = `subscribers`.`subscriber_id`         

The plan of this query is much worse:

1   PRIMARY     subscribers     ALL     NULL    NULL    NULL    NULL    23358546     
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    900000   
1   PRIMARY     <derived3>  ALL     NULL    NULL    NULL    NULL    900000   
3   DERIVED     subscribers_multivalued     ALL     field_fk    field_fk    4       20621115    Using filesort
2   DERIVED     subscribers_multivalued     ALL     field_fk    field_fk    4       20621115    Using filesort

and I couldn't wait untill it returns data;

Oleg
  • 2,733
  • 7
  • 39
  • 62
  • It actually is compliant with SQL-2003+ standard (although MySQL also allows non-compliant queries, it doesn't really do a sane check). – ypercubeᵀᴹ Nov 02 '12 at 15:24
  • 1
    I would certainly go with the faster in this case, since it does return deterministic results. You can't have this query qritten to run in many DBMS, because of the `GROUP_CONCAT()` function anyway. – ypercubeᵀᴹ Nov 02 '12 at 15:25
  • `General error` has nothing to do with your query being wrong or not compliant, it is a mysql server issue. – Salman A Nov 02 '12 at 15:27
  • Which tables are the columns `list` and `state` in? – ypercubeᵀᴹ Nov 02 '12 at 15:27
  • This error happens when the first query was executed, so it cause error of mysql. – Oleg Nov 02 '12 at 15:35
  • List_id and stat are fields of subscriber table. – Oleg Nov 02 '12 at 15:36
  • Do you mean that the last query is SQL-2003+ compliant (exception is GROUP_CONCAT()) – Oleg Nov 02 '12 at 15:37
  • @ypercube The SQL Standard allows for `SELECT`ed columns not in an aggregate or `GROUP`ed (granted that it's potentially 'safe' here, anyways, assuming `subscriber_id` is unique)? Otherwise... having a table with `_multivalued` appended strikes me as a little suspect; I'm guessing this is some form of EAV table, and not really 'multi-valued' (which usually refers to columns, where the value has to be parsed out)? – Clockwork-Muse Nov 02 '12 at 15:52
  • @Clockwork-Muse: I assumed that `subscriber_id` is the primary key of `subscriber`. About the SQL-compliance, that restriction is in SQL-92. The SQL-2003+ standard allow more. See my answer here: [Why does MySQL add a feature that conflicts with SQL standards?](http://stackoverflow.com/questions/7594865/why-does-mysql-add-a-feature-that-conflicts-with-sql-standards) – ypercubeᵀᴹ Nov 02 '12 at 15:55
  • suscriber_id in subscribers is primary key. – Oleg Nov 02 '12 at 16:05
  • @ypercube, ah, so the way it's being used here is (assumedly, given current knowledge of this schema) technically compliant even though 1) mySQL actually implements the feature in a non-compliant way, and 2) no other RDBMS (that I'm aware of) will let you try this. This would be a nicely sane way to make some of these requests, I'll admit (provided functionally dependent columns). – Clockwork-Muse Nov 02 '12 at 16:05
  • It's really some sort of EAV, this table is multivalues because every field can have many values, in other fields only one value. – Oleg Nov 02 '12 at 16:06
  • @Clockwork-Muse: Postgres allows this, (in version 9.1 I think) – ypercubeᵀᴹ Nov 02 '12 at 16:07
  • Please, don't hesitate to answer the question too. If you have link to SQL 2003 standard that confirms, it would be great too. – Oleg Nov 02 '12 at 17:03

1 Answers1

2

I'd keep the faster one. You can also test/benchmark this variation:

SELECT s.email_address, 
       s.first_name, 
       s.last_name,
       COALESCE(t1.Colors, '') AS Colors,
       COALESCE(t2.Languages, '') AS Languages
FROM subscribers AS s
  LEFT JOIN 
    ( SELECT t1.subscriber_id, 
          GROUP_CONCAT( DISTINCT t1.value SEPARATOR '|' ) AS Colors
      FROM subscribers AS s
        JOIN subscribers_multivalued AS t1
          ON s.subscriber_id = t1.subscriber_id 
      WHERE t1.field_id = 112
        AND s.list_id = 40 
        AND s.state = 1
      GROUP BY t1.subscriber_id 
    ) AS t1
    ON s.subscriber_id = t1.subscriber_id 
  LEFT JOIN 
    ( SELECT t2.subscriber_id, 
             GROUP_CONCAT( DISTINCT t2.value SEPARATOR '|' ) AS Languages
      FROM subscribers AS s
        JOIN subscribers_multivalued AS t2
          ON s.subscriber_id = t2.subscriber_id 
      WHERE t2.field_id = 111
        AND s.list_id = 40 
        AND s.state = 1
      GROUP BY t2.subscriber_id 
    ) AS t2
    ON s.subscriber_id = t2.subscriber_id 
WHERE s.list_id = 40 
  AND s.state = 1 ;

If (field_id, subscriber_id, value) is unique in table subscribers_multivalued, you can also drop the two DISTINCT.


Regarding speed and efficiency, check the indexes you have. These would help both your version and this one:

  • in the subscribers table, an index on either (list_id, state, subscriber_id) or (state, list_id, subscriber_id)

  • in the subscribers_multivalued table, an index on (field_id, subscriber_id, value) or (second best) on (field_id, subscriber_id).

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235