-7

When counting dates using count(*) or count(column) there are differences whenever I use group by to separate by year then month in contrast with simply using where column between for each month. Why is that? I was expecting that both group by and where would yield the same result for each month.

select year(datehour), month(datehour), count(*)
from `dates`
group by year(datehour), month(datehour);

select year(datehour), month(datehour), count(*)
from `dates`
where datehour between '2018-01-01' and '2018-01-31';

DDL

CREATE TABLE IF NOT EXISTS `dates` (
  `id` int(6) unsigned NOT NULL auto_increment,
  `datehour` datetime NOT NULL,
  PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `dates` (datehour) VALUES
('2018-01-09 11:42:00')
,('2018-01-09 11:35:00')
,('2018-01-17 16:24:00')
,('2018-01-16 17:58:00')
,('2018-01-02 15:48:00')
,('2018-01-08 15:48:00')
,('2018-01-02 17:17:00')
,('2018-01-11 18:22:00')
,('2018-01-11 17:19:00')
,('2018-01-11 17:58:00')
,('2018-01-11 18:51:00')
,('2018-01-11 16:46:00')
,('2018-01-11 17:04:00')
,('2018-01-16 19:32:00')
,('2018-01-16 15:49:00')
,('2018-01-13 16:26:00')
,('2018-01-12 14:52:00')
,('2018-01-17 10:57:00')
,('2018-01-25 10:45:00')
,('2018-02-13 12:15:00')
,('2018-02-14 12:22:00')
,('2018-02-13 15:13:00')
,('2018-01-12 15:18:00')
,('2018-01-12 16:36:00')
,('2018-01-12 16:54:00')
,('2018-01-12 16:48:00')
,('2018-01-30 11:51:00')
,('2018-01-30 11:49:00')
,('2018-01-30 14:53:00')
,('2018-01-30 14:52:00')
,('2018-01-30 17:20:00')
,('2018-01-30 12:41:00')
,('2018-02-06 08:16:00')
,('2018-01-30 12:07:00')
,('2018-02-02 16:35:00')
,('2018-02-04 14:23:00')
,('2018-01-30 11:27:00')
,('2018-01-30 12:29:00')
,('2018-01-30 10:28:00')
,('2018-01-30 16:47:00')
,('2018-01-30 13:20:00')
,('2018-01-30 13:19:00')
,('2018-02-02 16:37:00')
,('2018-01-30 17:17:00')
,('2018-01-30 18:09:00')
,('2018-02-06 14:11:00')
,('2018-01-30 11:31:00')
,('2018-01-30 11:29:00')
,('2018-01-30 17:16:00')
,('2018-01-30 15:52:00')
,('2018-01-30 16:50:00')
,('2018-01-30 17:10:00')
,('2018-01-31 09:44:00')
,('2018-01-30 17:09:00')
,('2018-01-31 13:06:00')
,('2018-01-30 18:10:00')
,('2018-01-30 16:50:00')
,('2018-01-30 17:44:00')
,('2018-01-30 17:58:00')
,('2018-01-30 15:30:00')
,('2018-01-30 15:32:00')
,('2018-02-06 16:55:00')
,('2018-01-30 17:21:00')
,('2018-01-30 17:15:00')
,('2018-01-30 18:11:00')
,('2018-02-01 20:46:00')
,('2018-02-06 16:58:00')
,('2018-02-04 14:20:00')
,('2018-02-04 00:13:00')
,('2018-02-09 17:03:00')
,('2018-02-02 14:42:00')
,('2018-01-31 11:14:00')
,('2018-01-31 12:38:00')
,('2018-01-31 12:52:00')
,('2018-01-31 18:41:00')
,('2018-01-31 11:39:00')
,('2018-01-31 15:53:00')
,('2018-01-31 11:39:00')
,('2018-02-02 14:43:00')
,('2018-02-08 13:38:00')
,('2018-01-31 11:15:00')
,('2018-01-31 12:04:00')
,('2018-02-02 14:10:00')
,('2018-01-31 16:47:00')
,('2018-01-31 11:15:00')
,('2018-02-14 10:20:00')
,('2018-03-03 13:39:00')
,('2018-02-09 11:55:00')
,('2018-03-13 13:24:00')
,('2018-02-20 09:13:00')
,('2018-03-06 15:19:00')
,('2018-02-19 09:15:00')
,('2018-01-31 12:38:00')
,('2018-01-31 11:13:00')
,('2018-01-31 12:55:00')
,('2018-01-31 12:39:00')
,('2018-01-31 13:07:00')
,('2018-01-31 18:34:00')
,('2018-01-31 11:55:00')
,('2018-01-31 11:55:00')
,('2018-01-31 17:19:00')
,('2018-01-31 18:36:00')
,('2018-01-31 14:16:00')
,('2018-01-31 18:37:00')
,('2018-01-31 18:37:00')
,('2018-01-31 15:03:00')
,('2018-02-02 14:09:00')
,('2018-01-31 14:11:00')
,('2018-02-04 10:17:00')
,('2018-01-31 18:36:00')
,('2018-01-31 18:38:00')
,('2018-01-31 17:17:00')
,('2018-01-31 18:34:00')
,('2018-01-31 16:12:00')
,('2018-01-31 18:46:00')
,('2018-01-31 18:40:00')
,('2018-01-31 15:29:00')
,('2018-01-31 18:35:00')
,('2018-02-24 15:34:00')
,('2018-01-31 18:33:00')
,('2018-01-31 18:46:00')
,('2018-02-05 08:13:00')
,('2018-01-31 16:38:00')
,('2018-01-31 18:40:00')
,('2018-01-31 16:43:00')
,('2018-02-13 08:28:00')
,('2018-01-31 16:45:00')
,('2018-02-01 16:49:00')
,('2018-01-31 18:33:00')
,('2018-02-03 19:02:00')
,('2018-02-06 13:19:00')
,('2018-02-19 15:35:00')
,('2018-02-22 16:14:00')
,('2018-02-19 16:33:00')
,('2018-02-19 17:12:00')
,('2018-02-28 18:26:00')
,('2018-03-03 13:35:00')
,('2018-03-05 13:23:00')
,('2018-02-28 18:25:00')
,('2018-02-28 18:25:00')
,('2018-02-28 18:26:00')
,('2018-02-28 18:39:00')
,('2018-02-28 18:41:00')
,('2018-03-05 13:22:00')
,('2018-03-15 20:45:00')
,('2018-03-04 14:59:00')
,('2018-03-05 13:25:00')
,('2018-03-07 15:17:00')
,('2018-03-07 13:31:00')
,('2018-03-01 12:40:00')
,('2018-03-01 11:05:00')
,('2018-03-01 11:06:00')
,('2018-03-01 12:40:00')
,('2018-03-01 11:06:00')
,('2018-03-01 11:09:00')
,('2018-03-01 11:08:00')
,('2018-03-01 12:39:00')
,('2018-03-01 17:31:00')
,('2018-03-01 11:07:00')
,('2018-03-01 11:07:00')
,('2018-03-01 13:11:00')
,('2018-03-01 11:05:00')
,('2018-03-03 16:46:00')
,('2018-03-12 17:38:00')
,('2018-03-01 12:38:00')
,('2018-03-01 15:58:00')
,('2018-03-01 15:41:00')
,('2018-03-01 12:39:00')
,('2018-03-05 19:22:00')
,('2018-03-01 14:04:00')
,('2018-03-01 13:00:00')
,('2018-03-01 15:39:00')
,('2018-03-01 15:41:00')
,('2018-03-02 12:03:00')
,('2018-03-02 13:07:00')
,('2018-03-02 12:24:00')
,('2018-03-02 11:35:00')
,('2018-03-02 14:18:00')
,('2018-03-06 16:02:00')
,('2018-03-06 13:57:00')
,('2018-03-06 13:57:00')
,('2018-03-08 13:37:00')
,('2018-03-12 19:50:00')
,('2018-03-06 16:05:00')
,('2018-03-08 19:43:00')
,('2018-03-06 10:07:00')
,('2018-03-09 07:53:00')
,('2018-03-09 16:49:00')
,('2018-03-20 13:40:00')
,('2018-03-16 11:50:00')
,('2018-03-06 16:03:00')
,('2018-03-07 16:57:00')
,('2018-03-06 16:06:00')
,('2018-03-06 17:19:00')
,('2018-03-09 20:10:00')
,('2018-03-10 15:43:00')
,('2018-03-13 10:05:00')
,('2018-03-06 13:56:00')
,('2018-03-10 16:02:00')
,('2018-03-06 16:02:00')
,('2018-03-06 16:04:00')
,('2018-03-09 16:07:00')
,('2018-03-28 11:17:00')
,('2018-03-28 15:18:00')
,('2018-03-28 12:47:00')
,('2018-03-28 14:13:00')
,('2018-03-28 13:23:00')
,('2018-03-28 16:00:00')
,('2018-03-28 14:31:00')
,('2018-03-28 13:10:00')
,('2018-03-28 14:02:00')
,('2018-03-28 14:02:00')
,('2018-03-28 13:56:00')
,('2018-03-28 15:59:00')
,('2018-03-28 16:54:00')
,('2018-03-28 13:23:00')
,('2018-03-28 15:24:00')
,('2018-03-28 15:20:00')
,('2018-03-28 16:17:00')
,('2018-03-28 13:20:00')
,('2018-03-28 13:20:00')
,('2018-03-28 15:59:00')
,('2018-03-28 16:54:00')
,('2018-03-28 16:54:00')
,('2018-03-28 16:11:00')
,('2018-03-28 18:29:00')
,('2018-03-28 15:59:00')
,('2018-03-28 18:43:00')
,('2018-03-31 13:33:00')
,('2018-04-14 10:57:00')
,('2018-03-29 10:57:00')
,('2018-03-29 11:36:00')
,('2018-03-29 09:12:00')
,('2018-03-29 11:47:00')
,('2018-03-29 11:48:00')
,('2018-04-06 19:27:00')
,('2018-03-29 12:22:00')
,('2018-04-03 15:31:00')
,('2018-03-29 12:47:00')
,('2018-04-04 14:27:00')
,('2018-04-05 17:00:00')
,('2018-04-03 15:30:00')
,('2018-04-12 12:27:00')
,('2018-04-10 20:17:00')
,('2018-04-07 19:00:00')
,('2018-04-08 16:33:00')
,('2018-04-06 13:57:00')
,('2018-04-07 17:46:00')
,('2018-04-10 17:17:00')
,('2018-04-03 13:30:00')
,('2018-04-05 18:10:00')
;

http://www.sqlfiddle.com/#!9/63fe7f/1

Braiam
  • 1
  • 11
  • 47
  • 78
  • 1
    Superficially, one counts "groups" while the other counts "rows". You'd only ever get the same count when every group is exactly a group of one row. – bishop Oct 15 '18 at 17:12
  • @bishop If that were true, I would expect the same results on every group, [like on this question](https://stackoverflow.com/q/2722408/792066). Or at least, the count(*) result from the group by to be less than using where, since the amount of groups will be always less than the individual rows. – Braiam Oct 15 '18 at 17:20

1 Answers1

2

Strings like '2018-01-31' are considered '2018-01-31 00:00:00' when used with datetime and timestamp values; so basically, you are not counting the last day of the time span with the latter query.

If you want to do the latter, it can be simpler to do:

where datehour >= '2018-01-01' and datehour < '2018-02-01'

Uueerdo
  • 15,723
  • 1
  • 16
  • 21
  • `between 2018-01-01 and 2018-02-01` seems to be equivalent too. Thanks. – Braiam Oct 15 '18 at 17:25
  • Yeah, that should be fine but could run into discrepancies if something falls on exactly`2018-02-01 00:00:00` as such a record will be counted in both months. – Uueerdo Oct 15 '18 at 17:29
  • @Braiam: We typically avoid using BETWEEN comparison with datetime, because we want to avoid "overlap"... we don't want rows with a time component of exactly midnight "00:00:00" to fall into two distinct buckets and get counted twice, when it matches conditions for both buckets. One fix for BETWEEN is to "backup" the upper bound from midnight by one second, or one millinsecond. But the better approach is to do what Uueerdo suggested... use two comparsions, one `>=` and one `<` something like this: `WHERE datehour >= '2018-01-01' AND datehour < '2018-01-01' + INTERVAL 1 MONTH` – spencer7593 Oct 15 '18 at 17:58
  • @spencer7593 yeah, I got it. – Braiam Oct 15 '18 at 18:00