0

The below query is taking almost 14 secs to complete. I have a person table with 1M entries in it. Can anyone suggest me how to make the query faster and reduce the execution time to say something like 1, 2 or 3 seconds ? I am attaching the explain details below.

SELECT p.id, 
COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =     DATE_FORMAT(NOW(),'%Y-%m-%d') THEN 1 END) AS today_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =    DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY), '%Y-%m-%d') THEN 1 END) AS     yesterday_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN    DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
COUNT('x') AS total_users_count
FROM person p 
WHERE p.app_id IN (SELECT p2.id FROM project p2 ) GROUP BY p.app_id

239 row(s) returned

Execution Time : 13.504 sec Transfer Time : 0.001 sec Total Time : 13.505 sec

shw create table for person and project

person  CREATE TABLE `person` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `device_push_token` longtext NOT NULL,
  `created_date` datetime NOT NULL,
  `since_last_login` datetime NOT NULL,
  `platform` smallint(6) NOT NULL,
  `hwid` varchar(255) NOT NULL,
  `app_id` bigint(20) NOT NULL,
  `since_last_push` datetime NOT NULL,
  `no_of_pushes` smallint(6) NOT NULL DEFAULT '0',
  `language` varchar(50) DEFAULT NULL,
  `timezone` bigint(20) DEFAULT '0',
  `since_last_hour_push` datetime DEFAULT NULL,
  `version` bigint(20) NOT NULL DEFAULT '1',
  `active` tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `hwid` (`hwid`,`app_id`),
  KEY `fk_person_platform` (`platform`),
  KEY `fk_person_project` (`app_id`),
  CONSTRAINT `fk_person_platform` FOREIGN KEY (`platform`) REFERENCES `platform` (`id`),
  CONSTRAINT `fk_person_project` FOREIGN KEY (`app_id`) REFERENCES `project` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1310384 DEFAULT CHARSET=latin1


project CREATE TABLE `project` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `unique_id` varchar(300) NOT NULL,
  `name` longtext NOT NULL,
  `description` longtext,
  `ios_configure` bigint(20) DEFAULT NULL,
  `android_configure` bigint(20) DEFAULT NULL,
  `freq_push` bigint(20) DEFAULT NULL,
  `hour_push` bigint(20) DEFAULT NULL,
  `push_sent` bigint(20) DEFAULT '0',
  `push_opened` bigint(20) DEFAULT '0',
  `version` bigint(20) NOT NULL DEFAULT '1',
  `created_date` datetime NOT NULL,
  `updated_date` datetime NOT NULL,
  `active` tinyint(1) NOT NULL DEFAULT '1',
  `project_apprater` bigint(20) DEFAULT NULL,
  `type` smallint(6) NOT NULL DEFAULT '1',
  `status` bigint(20) DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_id` (`unique_id`),
  KEY `fk_project_ios_config` (`ios_configure`),
  KEY `fk_project_android_config` (`android_configure`),
  KEY `fk_project_freq_push` (`freq_push`),
  KEY `fk_project_hour_push` (`hour_push`),
  KEY `fk_project_apprater` (`project_apprater`),
  KEY `fk_project_platform` (`type`),
  KEY `name` (`status`),
  CONSTRAINT `fk_project_android_config` FOREIGN KEY (`android_configure`) REFERENCES `project_configure_android` (`id`),
  CONSTRAINT `fk_project_apprater` FOREIGN KEY (`project_apprater`) REFERENCES `project_apprater` (`id`),
  CONSTRAINT `fk_project_freq_push` FOREIGN KEY (`freq_push`) REFERENCES `freq_push` (`id`),
  CONSTRAINT `fk_project_hour_push` FOREIGN KEY (`hour_push`) REFERENCES `hour_push` (`id`),
  CONSTRAINT `fk_project_ios_config` FOREIGN KEY (`ios_configure`) REFERENCES `project_configure_ios` (`id`),
  CONSTRAINT `fk_project_platform` FOREIGN KEY (`type`) REFERENCES `platform` (`id`),
  CONSTRAINT `name` FOREIGN KEY (`status`) REFERENCES `project_status` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=313 DEFAULT CHARSET=latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   PRIMARY p   index   \N  fk_person_project   8   \N  1158770 Using where
2   DEPENDENT SUBQUERY  p2  unique_subquery PRIMARY PRIMARY 8   func    1   Using index

Updated Full Query

SELECT 
  p3.id AS id,
  COALESCE(pug.active_users_count, 0) AS userCount, 
  p3.unique_id AS uniqueId,
  p3.name,
  p3.description,
  DATE_FORMAT(p3.created_date, '%m-%d-%Y %T') AS createdDate,
  p3.android_configure AS androidConfigure,
  p3.ios_configure AS iosConfigure,
  (SELECT 
    fp.active 
  FROM
    freq_push fp 
  WHERE fp.id = p3.freq_push) AS freqActive,
  (SELECT 
    hp.active 
  FROM
    hour_push hp 
  WHERE hp.id = p3.hour_push) AS hourActive,
  COALESCE(pug.total_users_count, 0) AS totalUserCount,
  COALESCE(pug.today_install_count, 0) AS todayInstallCount,
  COALESCE(pug.yesterday_install_count, 0) AS yesterdayInstallCount,
  COALESCE(pug.month_install_count, 0) AS monthInstallCount,
  COALESCE(pug.year_install_count, 0) AS yearInstallCount,
  COALESCE(pug.week_install_count, 0) AS weekInstallCount,
  (SELECT 
    plat.name 
  FROM
    platform plat 
  WHERE plat.id = p3.type) AS project_type ,
  ps.name
FROM 
  (SELECT 
    p.app_id,
    COUNT(
      CASE
        WHEN p.active = 1 
        THEN 1 
      END) AS active_users_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) = CURDATE() 
        THEN 1 
      END
    ) AS today_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) = DATE(DATE_SUB(NOW(), INTERVAL 1 DAY)) 
        THEN 1 
      END
    ) AS yesterday_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) BETWEEN DATE_SUB(
          CURDATE(),
          INTERVAL DAY(LAST_DAY(NOW())) DAY
        ) 
        AND CURDATE() 
        THEN 1 
      END
    ) AS month_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(), INTERVAL 1 YEAR) 
        AND CURDATE() 
        THEN 1 
      END
    ) AS year_install_count,
    COUNT(
      CASE
        WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(), INTERVAL 7 DAY) 
        AND CURDATE() 
        THEN 1 
      END
    ) AS week_install_count,
    COUNT('x') AS total_users_count 
  FROM
    person p 
    INNER JOIN project p2 
      ON p.app_id = p2.id 
  GROUP BY p.app_id) AS pug 
  RIGHT JOIN project p3 
    ON p3.id = pug.app_id     
  INNER JOIN project_status ps  
  ON p3.status = ps.id
ORDER BY userCount DESC,
  createdDate DESC
Kenshin
  • 1,030
  • 2
  • 12
  • 41
  • 1
    show output from `show create table person` and output from `show create table project` – Drew Nov 02 '15 at 16:17
  • 1
    or just do a join and no correlated subquery – Drew Nov 02 '15 at 16:18
  • you're comparing derived values. there is NOTHING you can do to optimize this until you start using bare indexed "original" values. plus, why `date_format(...)` to convert a date/time value from datetime->string->date, when you could simply have `date()` and go datetime->date directly? – Marc B Nov 02 '15 at 16:18
  • where were those "I am attaching the explain details below". Here is how to [Share those](http://stackoverflow.com/a/13368211), seen under General Comments and sharing – Drew Nov 02 '15 at 16:24
  • Hi Drew, Pasted show create table person and project. Please check – Kenshin Nov 03 '15 at 07:22
  • A million persons? Maybe the table name is not deceptive? – Rick James Nov 28 '15 at 15:59

5 Answers5

1

Revamping this Answer from previous version. Basically Mark B is right in comments under question. Fortunately progress has been made by the OP, and the time has been reduced from say 13 seconds to a bit under 6. The OP said (in comments under his own Answer and in chat) that if the time could be reduced to under 1 second, he would consider other approaches. Like the one I was talking about with him concerning accepting somewhat stale metrics where he gets to choose how long the stale duration lasts. A trade-off between staleness and speed for user.

So here is one approach to that.

One uses Create Event to create an event that fires automatically every nnn (time period) Interval of his choosing. That event updates a table that his end-users access. The event itself runs his query from his Answer that you will see embedded in the event below.

Schema Change

create table appIdMetrics
(   -- this is the table Users hit against
    appId int not null primary key,
    active_users_count int not null,
    today_install_count int not null,
    yesterday_install_count int not null,
    month_install_count int not null,
    year_install_count int not null,
    week_install_count int not null,
    total_users_count int not null
);

create table evt_appIdMetrics
(   -- this is the worktable that only the Event uses
    -- while it puts together the refreshed data
    -- perhaps once every 5 minutes
    appId int not null primary key,
    active_users_count int not null,
    today_install_count int not null,
    yesterday_install_count int not null,
    month_install_count int not null,
    year_install_count int not null,
    week_install_count int not null,
    total_users_count int not null
);

Event Creation

drop event updateAppIdMetrics;
DELIMITER $$
CREATE EVENT updateAppIdMetrics
    ON SCHEDULE
        EVERY 5 MINUTE

DO BEGIN
    truncate table evt_appIdMetrics;    -- this is the table that only the evt has access to

    -- time to refresh this table (approx 6 seconds)
    -- 280 rows (count as per OP comments)
    insert into evt_appIdMetrics
    (appId,active_users_count,today_install_count,yesterday_install_count,
    month_install_count,year_install_count,week_install_count,total_users_count)
    select p.app_id, 
    COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
    COUNT(CASE WHEN DATE(p.created_date)= CURDATE() THEN 1 END) AS today_install_count,  
    COUNT(CASE WHEN DATE(p.created_date) = DATE(DATE_SUB(NOW(),INTERVAL 1 DAY)) THEN 1 END) AS     yesterday_install_count,  
    COUNT(CASE WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
    COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
    COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
    COUNT('x') AS total_users_count
    FROM person p 
    INNER JOIN project p2 ON p.app_id = p2.id 
    GROUP BY p.app_id;

    -- BEGIN LOCK (important)
    -- figure out a locking scheme (work-in-progress, not completed yet)
    truncate table appIdMetrics;    -- this is the table users access

    -- the following should take a split second on the approximately 280 rows (count as per OP comments)
    insert into appIdMetrics
    (appId,active_users_count,today_install_count,yesterday_install_count,
    month_install_count,year_install_count,week_install_count,total_users_count)
    select appId,active_users_count,today_install_count,yesterday_install_count,
    month_install_count,year_install_count,week_install_count,total_users_count
    from evt_appIdMetrics;
    -- complete locking schema (work-in-progress, not completed yet)
    -- END LOCK (important)
END;$$
DELIMITER ;
-- evt creation succeeded by passing Syntax Error check

Users interact with table appIdMetrics. When I get a chance I will tweak the locking scheme mentioned. The UX for the user should be a split second. Data refresh interval is adjustable by OP for stale factor. From my experience the event will fire for the first time after the first time period interval. So that means 5 minutes.

I will provide a link for Event management in a little bit. Edit: here it is. Events must be enabled.

Community
  • 1
  • 1
Drew
  • 24,851
  • 10
  • 43
  • 78
  • I will try this and let you know – Kenshin Nov 03 '15 at 05:47
  • Even after creating this covering index ( ALTER TABLE person ADD INDEX app_id_created_date_id ( app_id,created_date,id ) ) and with inner query, no improvement. Its still taking 10 secs – Kenshin Nov 03 '15 at 13:36
  • Drew, please see my answer, Replacing DATE_FORMAT() with DATE() ended the query execution time in 5 secs. Could you explain why there is a drastic improvement in performance? – Kenshin Nov 03 '15 at 14:18
  • the runtime optimation a.k.a. O(n) of date() is superior to date_format() due to less functionality in the former, but just enough functionality to make it right – Drew Nov 03 '15 at 14:39
  • Drew, Lock and Unlock are not allowed in stored procedures. what to do ?https://dev.mysql.com/doc/refman/5.7/en/stored-program-restrictions.html – Kenshin Nov 07 '15 at 06:58
  • Can we use GetLock() and ReleaseLock() for this ? – Kenshin Nov 07 '15 at 07:24
  • Mysql manual page on [Lock modes](https://dev.mysql.com/doc/refman/5.5/en/innodb-lock-modes.html) – Drew Nov 07 '15 at 14:09
  • Drew, can you suggest a locking feature...? – Kenshin Nov 17 '15 at 14:25
0

Maybe you can try to join the table. But I'm not sure this can reduce the execution time until 3 seconds.

 SELECT p.id, 
 COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =     DATE_FORMAT(NOW(),'%Y-%m-%d') THEN 1 END) AS today_install_count,  
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =         DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY), '%Y-%m-%d') THEN 1 END) AS     yesterday_install_count,  
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN    DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
 COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
 COUNT('x') AS total_users_count
 FROM person p 
 INNER JOIN project p2 ON p.app_id = p2.id 
 GROUP BY p.app_id
Wilianto Indrawan
  • 2,394
  • 3
  • 21
  • 26
0

IN this query what I understood is, you need all the counts from last one year and not bothered about very old data. In that case,if you have a project_date in your project table then you can limit ids in sub query, this may help to perform much better than the old one.

SELECT p.id, 
COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =DATE_FORMAT(NOW(),'%Y-%m-%d') THEN 1 END) AS today_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') =DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY), '%Y-%m-%d') THEN 1 END) AS yesterday_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END) AS month_install_count,  
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS year_install_count, 
COUNT(CASE WHEN DATE_FORMAT(p.created_date,'%Y-%m-%d') BETWEEN DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS week_install_count
FROM person p 
WHERE p.app_id IN (SELECT p2.id FROM project p2 AND p2.project_date > DATE_SUB(CURDATE(),INTERVAL 1 YEAR)) 
GROUP BY p.app_id;

Now You can take the total count of each project Id seperately and merge with the above counts.

SELECT p.id, COUNT('x') AS total_users_count 
FROM person p 
WHERE p.app_id IN (SELECT p2.id FROM project p2) 
GROUP BY p.app_id;

Hope this will help.

0

The below query execution time reduced to 5 secs. Can you guys explain why changing from DATE_FORMAT() TO DATE() gives this drastic improvement ?

SELECT p.app_id, 
 COUNT(CASE WHEN p.active=1 THEN 1 END) AS active_users_count, 
 COUNT(CASE WHEN DATE(p.created_date)= CURDATE() THEN 1 END) AS today_install_count,  
 COUNT(CASE WHEN DATE(p.created_date) = DATE(DATE_SUB(NOW(),INTERVAL 1 DAY)) THEN 1 END) AS     yesterday_install_count,  
 COUNT(CASE WHEN DATE(p.created_date) BETWEEN DATE_SUB(CURDATE(),INTERVAL DAY(LAST_DAY(NOW())) DAY) AND CURDATE() THEN 1 END)         AS month_install_count,  
 COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 1 YEAR) AND CURDATE() THEN 1 END) AS     year_install_count, 
 COUNT(CASE WHEN DATE(p.created_date) BETWEEN     DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE() THEN 1 END) AS     week_install_count, 
 COUNT('x') AS total_users_count
 FROM person p 
 INNER JOIN project p2 ON p.app_id = p2.id 
 GROUP BY p.app_id
Kenshin
  • 1,030
  • 2
  • 12
  • 41
  • the runtime optimation a.k.a. O(n) of date() is superior to date_format() due to less functionality in the former, but just enough functionality to make it right – Drew Nov 03 '15 at 14:46
  • I want to make it below 2 secs. creating a covering index like (created_date, app_id, id ) didn't help. I thought it would give improvement, but no improvement at all even after creating this index. Do you know why ? – Kenshin Nov 04 '15 at 06:48
  • I want to group by application identifier and not by person identifier. I want to get the count of people belongs to each application. That's why app_id. – Kenshin Nov 05 '15 at 05:51
  • Okay, sorry, it is actually SELECT p.app_id, and not id – Kenshin Nov 05 '15 at 05:59
  • Please see the updated full query. Its taking 5.79 secs....Please check if there is anything wrong in it. Is it possible to make any more improvement in that query ? could it be possible to make it less than 3 secs ? – Kenshin Nov 05 '15 at 06:10
  • How accurate does it need to be, roughly. Accurate up to the last 5 minutes if it performs amazing for you ? And how many rows does it return, like 239 ? I forget. – Drew Nov 05 '15 at 06:16
  • It returns 281 rows, taking 5.790 secs for execution. Accurate up to the last 5 minutes ? what is it? – Kenshin Nov 05 '15 at 06:23
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/94280/discussion-between-arun-and-drew). – Kenshin Nov 05 '15 at 06:31
0

Another optimization thought... In a subquery, compute once how many days ago created_date, as an integer. Then in the outer query do the more efficient

age <= 365 AS year_install_count,
age <=   7 AS weel_install_count,
...

Note that x <= y is a "Boolean", which displays as "1" for true and "0" for false. So, no need for repeated DATE_SUB, DATE_FORMAT or later COALESCE, etc.

For getting age, try DATEDIFF(created_date, CURRDATE) or play with TO_DAYS(CURRDATE()) - TO_DAYS(CURRDATE()). Caveat: it may be off by 1.

Rick James
  • 135,179
  • 13
  • 127
  • 222