4

I have two tables, players and games, created as follows:

CREATE TABLE IF NOT EXISTS `players` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `created_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

CREATE TABLE IF NOT EXISTS `games` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `player` int(11) NOT NULL,
  `played_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;

I wish to extract 3 values for each day:

  1. The number of players created at that day
  2. The number of players played at that day
  3. The number of players having played for the first time at that day

So, suppose for example that the players table looks as follows:

+----+--------+---------------------+
| id | name   | created_at          |
+----+--------+---------------------+
|  1 | Alan   | 2016-02-01 00:00:00 |
|  2 | Benny  | 2016-02-01 06:00:00 |
|  3 | Calvin | 2016-02-02 00:00:00 |
|  4 | Dan    | 2016-02-03 00:00:00 |
+----+--------+---------------------+

And the games table looks as follows:

+----+--------+---------------------+
| id | player | played_at           |
+----+--------+---------------------+
|  1 |      1 | 2016-02-01 01:00:00 |
|  2 |      3 | 2016-02-02 02:00:00 |
|  3 |      2 | 2016-02-03 14:00:00 |
|  4 |      3 | 2016-02-03 17:00:00 |
|  5 |      3 | 2016-02-03 18:00:00 |
+----+--------+---------------------+

Then the query should return something like

+------------+-----+--------+-------+
| day        | new | played | first |
+------------+-----+--------+-------+
| 2016-02-01 | 2   | 1      | 1     |
| 2016-02-02 | 1   | 1      | 1     |
| 2016-02-03 | 1   | 2      | 1     |
+------------+-----+--------+-------+

I have a solution for 1 (new):

SELECT Date(created_at) AS day,
       Count(*)         AS new
FROM   players
GROUP  BY day;  

That's easy. I think I also have a solution for 2 (played), thanks to MySQL COUNT DISTINCT:

select Date(played_at) AS day,
       Count(Distinct player) AS played
FROM   games
GROUP  BY day;

But I have no idea how to get the needed result for 3 (first). I also don't know how to put everything in a single query, to save execution time (the games table may include millions of records).


In case you need it, here's a query which inserts the example data:

INSERT INTO `players` (`id`, `name`, `created_at`) VALUES
(1, 'Alan', '2016-02-01 00:00:00'),
(2, 'Benny', '2016-02-01 06:00:00'),
(3, 'Calvin', '2016-02-02 00:00:00'),
(4, 'Dan', '2016-02-03 00:00:00');

INSERT INTO `games` (`id`, `player`, `played_at`) VALUES
(1, 1, '2016-02-01 01:00:00'),
(2, 3, '2016-02-02 02:00:00'),
(3, 2, '2016-02-03 14:00:00'),
(4, 3, '2016-02-03 17:00:00'),
(5, 3, '2016-02-03 18:00:00');
Community
  • 1
  • 1
Bach
  • 6,145
  • 7
  • 36
  • 61

4 Answers4

4

One version is to get all relevant data into a union and do the analysis from there;

SELECT SUM(type='P') new, 
       COUNT(DISTINCT CASE WHEN type='G' THEN pid END) played, 
       SUM(type='F') first 
FROM (
  SELECT id pid, DATE(created_at) date, 'P' type FROM players 
  UNION ALL 
  SELECT player, DATE(played_at) date,  'G' FROM games 
  UNION ALL 
  SELECT player, MIN(DATE(played_at)),  'F' FROM games GROUP BY player
) z 
GROUP BY date;

In the union;

Records with type P is player creation statistics.
Records with type G is player related game statistics.
Records with type F is statistics for when players played their first game.

Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
  • This is absolutely wonderful. Code-wise, it's probably as simple as it gets. How is it in terms of execution speed? – Bach Feb 16 '16 at 09:53
  • @Bach With basic indexing, it would seem to do fairly well, but for large amounts you may want to consider for example storing dates along with datetimes to avoid all DATE() conversions and grouping as much as you can in the subquery to minimize the amount of rows generated. – Joachim Isaksson Feb 16 '16 at 11:30
  • Joachim, thanks for your answer and for your explanations! – Bach Feb 16 '16 at 12:54
2

You can count the result of a temp table based on min(played_at) and filterd by having

select count(player) from 
   (  select player, min(played_at)  
      from games 
      group by player 
      having min(played_at) = YOUR_GIVEN_DATE ) as t;
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
  • First of all, you probably want `Day(Min(played_at))`. Secondly, this solution probably requires to have a query for every possible date, which is not feasible... – Bach Feb 16 '16 at 09:18
1

this query will give you the result:

select day,( select count(distinct(id)) from players where Date(created_at) = temp.day ) as no_created_at ,
( select count(distinct(player)) from games where Date(played_at) = temp.day) as no_played_at,
( select count(distinct(player)) from games  where Date(played_at) = 
(select min(Date(played_at)) from games internal_games 
where internal_games.player =games.player and Date(games.played_at) = temp.day )) as no_first_played_at
 from (
SELECT Date(created_at) AS day     
FROM   players
GROUP  BY day 
union 
select Date(played_at) AS day
FROM   games
GROUP  BY day) temp 

and the output:

enter image description here

Gouda Elalfy
  • 6,888
  • 1
  • 26
  • 38
  • This is **almost** precise, as the number of players playing on `2016-02-03` is 2 (at least with my example data). – Bach Feb 16 '16 at 09:29
  • @Bach I update it, you can use distinct with id of player for count instead count all rows – Gouda Elalfy Feb 16 '16 at 09:34
1

Here's a solution with a bunch of subqueries, which accounts for the possibility that players may have been created on days with no games, and vice versa:

select
    all_dates.date as day,
    ifnull(new.num, 0) as new,
    ifnull(players.num, 0) as players,
    ifnull(first.num, 0) as first
from (
    select date(created_at) as date from players
    union
    select date(played_at) from games
) as all_dates
left join (
    select date(created_at) as created_at_date, count(*) as num
    from players
    group by created_at_date
) as new on all_dates.date = new.created_at_date
left join (
    select date(played_at) as played_at_date, count(distinct player) as num
    from games
    group by played_at_date
) as players on all_dates.date = players.played_at_date
left join (
    select min_date, count(*) num
    from (
        select player, date(min(played_at)) as min_date
        from games
        group by player
    ) as players_first
    group by min_date
) as first on all_dates.date = first.min_date
order by day;
Emmett
  • 14,035
  • 12
  • 56
  • 81
  • It's nice. It returns NULL instead of 0 in days with no games (unlike in Joachim's method). – Bach Feb 16 '16 at 09:52