I am auditing user details from my application using open Id login .If a first time a user is login a OPEN ID we consider as signup . I am generating audit signin report using this details . Sample Table Data.
+---------+----------+-----------+---------------+
| USER_ID | PROVIDER | OPERATION | TIMESTAMP |
+---------+----------+-----------+---------------+
| 120 | Google | SIGN_UP | 1347296347000 |
| 120 | Google | SIGN_IN | 1347296347000 |
| 121 | Yahoo | SIGN_IN | 1347296347000 |
| 122 | Yahoo | SIGN_IN | 1347296347000 |
| 120 | Google | SIGN_UP | 1347296347000 |
| 120 | FaceBook | SIGN_IN | 1347296347000 |
+---------+----------+-----------+---------------+
In this table I want to exclude already SIGN_UP ed "SIGN_IN" ed user count based on provider .
Show Create table
CREATE TABLE `signin_details` (
`USER_ID` int(11) DEFAULT NULL,
`PROVIDER` char(40) DEFAULT NULL,
`OPERATION` char(40) DEFAULT NULL,
`TIMESTAMP` bigint(20) DEFAULT NULL
) ENGINE=InnoDB
I am using this query .
select
count(distinct(USER_ID)) as signin_count,
PROVIDER from signin_details s1
where
s1.USER_ID NOT IN
(
select
USER_ID
from signin_details
where
signin_details.PROVIDER=s1.PROVIDER
and signin_details.OPERATION='SIGN_UP'
and signin_details.TIMESTAMP/1000 BETWEEN UNIX_TIMESTAMP(CURRENT_DATE()-INTERVAL 1 DAY) * 1000 AND UNIX_TIMESTAMP(CURRENT_DATE()) * 1000
)
AND OPERATION='SIGN_IN' group by PROVIDER;
Explain Output:
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
| 1 | PRIMARY | s1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | signin_details | ALL | NULL | NULL | NULL | NULL | 6 | Using where |
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
Query Output :
+--------------+----------+
| signin_count | PROVIDER |
+--------------+----------+
| 1 | FaceBook |
| 2 | Yahoo |
+--------------+----------+
It takes more than 40 minutes to execute for 200k rows.
My assumption is it will check each row with total number of dependant subquery output.
My Assumption on this query.
A -> Dependant Outputs (B,C,D) .
A check with B
A check with C
A check with D
If dependant query output is larger it will take so long time to execute. How to improve this query?