1

I have the following SQL query (DB is MySQL 5):

select
    event.full_session_id,
    DATE(min(event.date)),
    event_exe.user_id,
    COUNT(DISTINCT event_pat.user_id)
FROM
    event AS event
JOIN event_participant AS event_pat ON
    event.pat_id = event_pat.id
JOIN event_participant AS event_exe on
    event.exe_id = event_exe.id
WHERE
    event_pat.user_id <> event_exe.user_id
GROUP BY
    event.full_session_id; 

"SHOW CREATE TABLE event":

CREATE TABLE `event` (
  `id` int(12) NOT NULL AUTO_INCREMENT,
  `date` datetime NOT NULL,
  `session_id` varchar(64) DEFAULT NULL,
  `full_session_id` varchar(72) DEFAULT NULL,
  `pat_id` int(12) DEFAULT NULL,
  `exe_id` int(12) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `SESSION_IDX` (`full_session_id`),
  KEY `PAT_ID_IDX` (`pat_id`),
  KEY `DATE_IDX` (`date`),
  KEY `SESSLOGPATEXEC_IDX` (`full_session_id`,`date`,`pat_id`,`exe_id`)
) ENGINE=MyISAM AUTO_INCREMENT=371955 DEFAULT CHARSET=utf8

"SHOW CREATE TABLE event_participant":

CREATE TABLE `event_participant` (
  `id` int(12) NOT NULL AUTO_INCREMENT,
  `user_id` varchar(64) NOT NULL,
  `alt_user_id` varchar(64) NOT NULL,
  `username` varchar(128) NOT NULL,
  `usertype` varchar(32) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ALL_UNQ` (`user_id`,`alt_user_id`,`username`,`usertype`),
  KEY `USER_ID_IDX` (`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=5397 DEFAULT CHARSET=utf8

Also, the query itself seems ugly, but this is legacy code on a production system, so we are not expected to change it (at least for now).

The problem is that, there is around 36 million record on the event table (in the production system), so there have been frequent crashes of the DB machine due to using temporary;using filesort processing (they provided these EXPLAIN outputs, unfortunately, I don't have them right now. I'll try to update them to this post later.)

The customer asks for a "quick fix" by adding indices. Currently we have indices on full_session_id, pat_id, date (separately) on event and user_id on event_participant. Thus I'm thinking of creating a composite index (pat_id, exe_id, full_session_id, date) on event- this index comprises of the fields in the join (equivalent to where ?), then group by, then aggregate (min) parts. This is just an idea because we currently don't have that kind of data volume to test, so we try the best we could first.

My question is:

Thanks in advance for your help :)

Update:

I have updated the full table description for the two related tables.

MySQL version is 5.1.69. But I think we don't need to worry about the ambiguous data issue mentioned in the comments, because it seems there won't be ambiguity for our data. Specifically, for each full_session_id, there is only one "event_exe.user_id" returned (it's just a business logic in the application)

So, what do you think about my 2 questions ?

ramcrys
  • 335
  • 1
  • 2
  • 9
  • That `GROUP BY` is invalid, will raise an error for newer MySQL versions (unless in compatibility mode.) You typically GROUP BU the same columns as you SELECT, except those who are arguments to set functions. I.e. try `GROUP BY event.full_session_id, event_exe.user_id`. – jarlh May 23 '19 at 13:10
  • is `event.full_session_id` unique in your table meaning does it has a PRIMARY or UNIQUE key? If not your query is invalid SQL see [MySQL Handling of GROUP BY](https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html) – Raymond Nijland May 23 '19 at 13:10
  • @jarlh don't forget MySQL supports a optional feature from the SQL:1999 standard which permits such nonaggregates if they are [functionally dependent](https://dev.mysql.com/doc/refman/5.7/en/group-by-functional-dependence.html) on GROUP BY columns – Raymond Nijland May 23 '19 at 13:12
  • Yes, there is a Unique index on the full_session_id column, I forgot to mention the unique part. – ramcrys May 23 '19 at 13:14
  • 1
    i also advice you to use `SHOW CREATE TABLE table` for every table involved in query question and post it here so we know the datatypes and defined indexes beter.. Code tells more and is more clear then explaining it with text.. – Raymond Nijland May 23 '19 at 13:16
  • 1
    @RaymondNijland, but not MySQL version 5, I suppose? I'd recommend the old rule anyway, because it will never go wrong. – jarlh May 23 '19 at 13:18
  • *" but not MySQL version 5, I suppose? I'd recommend the old rule, because it will never go wrong"* @jarlh fair warning but functionally dependent should more or less works under MySQL 5.7.5 but you don't have a safety net (error) if the query is written wrong.. Just like the manual says.. **So you should not be using functionally dependent under MySQL 5.7.5 to be safe** – Raymond Nijland May 23 '19 at 13:22
  • Yes, but I confirm that this query has been working in the system for a while. Just recently the problem arises due to the size of the table (supposedly). So let's just assume that we aren't having a syntax problem in this case :) – ramcrys May 23 '19 at 13:24
  • @ramcrys it's not about syntax error, its about getting invalid data set meaning you get unrelated data in the selected columns which are not in the GROUP BY .. Which you can get when your query is using GROUP BY invalid assuming older MySQL version which does not know sql_mode ONLY_FULL_GROUP_BY – Raymond Nijland May 23 '19 at 13:26
  • @RaymondNijland I do not think you can have functional dependency between two fields coming from different tables. – Shadow May 23 '19 at 13:32
  • Oh that's a nasty issue. So let's assume that we are using the correct version, and the mode is turned on for that feature too. In which case can we receive unrelated data here ? I'm just really curious to learn more. – ramcrys May 23 '19 at 13:34
  • *"I do not think you can have functional dependency between two fields coming from different tables. "* @Shadow well actually you can see https://www.db-fiddle.com/f/j2urtL3r83aaRf1wQUEyd/2 – Raymond Nijland May 23 '19 at 13:44
  • *"So let's assume that we are using the correct version, and the mode is turned on for that feature too. In which case can we receive unrelated data here"* See mine comment to Shadow your query is then valid with the correct data assuming you are using MySQL version 5.7.5+ – Raymond Nijland May 23 '19 at 13:45
  • This is really weird. I'm sure this query has been used for a while, and the testing haven't reported incorrect results. Might be possible that I have copied incorrectly the query here. So I'll double check tomorrow and maybe create a full fiddle for it. But currently, what do you think about my questions (assuming that I copied incorrectly the query). Or we just wait for the full fiddle for more insights ? – ramcrys May 23 '19 at 14:39
  • @ramcrys - There is a huge difference between 5.0 and 5.7 -- Please be more specific about the Version. And I don't want to start working on this until I see the `SHOWs`. – Rick James May 23 '19 at 23:04
  • Thanks for all your comments, guys. I have updated the tables description as requested. – ramcrys May 24 '19 at 07:54
  • The statements looks like it groups the majority of data as the where clause excludes joined event_participant records with the same user_id value. Indexes help with an index seek when the returned result is a significantly smaller set of the full set of rows. An index seek that will finally return more rows then 20%-30% of the total rows is slower than a plain table scar or an index scan. Do not count the grouped rows but all rows build the grouping. Still, if returned results are few and indexes do help, a set of pat_id, exe_id, full_session_id as 3 single indexes is needed. – Stefanos Zilellis May 24 '19 at 14:34
  • For query optimization, if there is a date filter (need date index) use it on the first join and place the where onto the 2nd join. this may help to choose a better execution plan. – Stefanos Zilellis May 24 '19 at 14:34
  • "Do not count the grouped rows but all rows build the grouping" -> sorry, I'm not sure what you mean by this ? And about the query optimization, could you post your suggested query (because "date" filter is on the `event` table, not on the join condition, so I'm also confused here what to change) – ramcrys May 27 '19 at 06:46

0 Answers0