2

Simplified schema of m:n relation implementing a subscription model:

CREATE TABLE c (
  id INT(11) PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(32)
) ENGINE=MyISAM CHARACTER SET=UTF8;

CREATE TABLE t (
  id INT(11) PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(32)
) ENGINE=MyISAM CHARACTER SET=UTF8;

CREATE TABLE c2t (
  id INT(11) PRIMARY KEY AUTO_INCREMENT,
  cid INT(11) NOT NULL,
  tid INT(11) NOT NULL,
  dateStart DATE NULL,
  dateEnd DATE NULL
) ENGINE=MyISAM CHARACTER SET=UTF8;

INSERT INTO c (name) VALUES ('mike'),('carl'),('suzy');

INSERT INTO t (name) VALUES ('plan1'),('plan2'),('plan3'),('plan4');

INSERT INTO c2t (cid, tid, dateStart, dateEnd) VALUES
  (1, 1, '2014-01-01', '2014-07-31'),
  (1, 2, '2014-08-01', '2015-07-31'),
  (1, 1, '2015-08-01', null),
  (1, 3, '2015-09-01', null),

  (2, 1, '2014-01-01', '2015-07-31'),
  (2, 2, '2015-08-01', '2015-09-30'),
  (2, 3, '2015-09-30', null),

  (3, 1, '2014-01-01', '2014-12-31'),
  (3, 2, '2014-01-01', '2014-12-31'),
  (3, 3, '2015-01-01', '2015-10-31'),
  (3, 4, '2015-01-01', '2015-10-31');

I've developed a query to find the c's who have active subscriptions of t's:

SELECT c.*
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
  AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
    AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0;

Result as expected:

id  name
1   mike
2   carl

The problem arises when I try to count the result rows. The query is almost identical, I've just dropped in a COUNT(*):

SELECT COUNT(*)
FROM c
LEFT JOIN c2t ON c.id = c2t.cid
  AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
    AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
GROUP BY c2t.cid
HAVING COUNT(c2t.id) > 0;

Result:

`COUNT(*)`
2
1

Expected result would be a single row containing the number of rows found (2). I can only assume that the GROUP BY is interfering, but have no idea how to work around. Explanations are most welcome.

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
willydee
  • 187
  • 3
  • 10

2 Answers2

1

Wrap everything with subquery and use COUNT in outer query:

SELECT COUNT(*)
FROM (
  SELECT c.*
  FROM c
  LEFT JOIN c2t ON c.id = c2t.cid
    AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
    AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))
  GROUP BY c2t.cid
  HAVING COUNT(c2t.id) > 0
) AS sub
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • just an improvement , Instead using count(*) try to use a key in that table eg : count(primary_Key) will improve your query performance especially if you use innodb . – csf Nov 25 '15 at 16:31
  • @csf would a wrapped SELECT benefit from an index in the subquery's table? – willydee Nov 25 '15 at 16:34
  • @willydee Check http://stackoverflow.com/questions/5179969/what-is-better-in-mysql-count-or-count1 Query optimizer (at least in SQL Server world) is smart enough to get optimal plan between `COUNT(1)/COUNT(*)/COUNT(pk)` – Lukasz Szozda Nov 25 '15 at 16:35
1

If the only thing you want returned is the number of c's who have active subscriptions, then you can simplify your query like this:

SELECT COUNT(DISTINCT c.id) AS cnt
FROM c
INNER JOIN c2t ON c.id = c2t.cid
  AND NOW() BETWEEN COALESCE(dateStart, '0000-00-00')
    AND COALESCE(dateEnd, DATE_ADD(NOW(), INTERVAL 1 DAY))

So, INNER JOIN is used in place of LEFT JOIN: there is no need to return c's with no matches in c2t, since these are not going to have any active subscriptions.

Also, there is no need to GROUP BY: the query returns just one row with the number of c's.

Finally, DISTINCT must be used in COUNT so as to avoid counting duplicate c.id values more than once.

Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98