2

Could some one please assist me creating a mysql monthly report for active users. A user visits a doctor on a particular date. that date is recorded on the visit date. they are then immediately issued an appointment date of when they should come back. A users active months are those from the month of visit all the way up to appointment date month , plus 90 days grace period.. if they dont pitch on the appointment date, they are given a grace period of 90 days, in which they'll still be considered as active users. after that they wont be considered as active anymore.

Users Table
+------------+------------+------------+
|  UserID    |  visit     |Appointment |  
+------------+------------+------------+
| 10001      | 01-01-2010 | 01-02-2010 | 
| 10001      | 05-02-2010 | 01-03-2010 | 
| 10002      | 20-07-2010 | 15-10-2010 | 
| 10003      | 01-11-2010 | 10-11-2010| 
+------------+------------+------------+

The desired results would be

Monthly Report
+------------+------------+------------+
|  Month     |  active    |            |  
+------------+------------+------------+
| 2010-01    | 1          |            | 
| 2010-02    | 1          |            | 
| 2010-03    | 1          |            | 
| 2010-04    | 1          |            | 
| 2010-05    | 1          |            | 
| 2010-07    | 1          |            |
| 2010-08    | 1          |            | 
| 2010-09    | 1          |            | 
| 2010-10    | 1          |            | 
| 2010-11    | 2          |            | 
| 2010-12    | 2          |            | 
| 2011-01    | 1          |            | 
+------------+------------+------------+

Here is the sql code

  CREATE TABLE `visits` (
   `id` int(10) NOT NULL,
   `userid` int(10) NOT NULL,
   `visit` date NOT NULL,
   `appointment` date NOT NULL
   ) ENGINE=InnoDB DEFAULT CHARSET=latin1;



INSERT INTO `visits` (`id`, `userid`, `visit`, `appointment`) VALUES
(1, 10001, '2010-01-01', '2010-02-01'),
(2, 10001, '2010-02-05', '2010-03-01'),
(3, 10002, '2010-07-20', '2010-10-15'),
(4, 10003, '2010-11-01', '2010-11-10');


ALTER TABLE `visits`
ADD PRIMARY KEY (`id`);

ALTER TABLE `visits`
  MODIFY `id` int(10) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=10007;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
Sam Naholo
  • 33
  • 4
  • 1
    Consider handling issues of data display in application level code. – Strawberry May 15 '17 at 10:33
  • This is a lot more easier to achieve in the application layer. SQL language in general is not really great at creating new records on the fly. It can be done, but it will be ugly. – Shadow May 15 '17 at 11:00
  • An answer to an existing SO question to get you started: http://stackoverflow.com/a/31141256/5389997 – Shadow May 15 '17 at 11:03
  • Noted strawberry. With a million or so records the application layer would take so much time to process Shadow. – Sam Naholo May 15 '17 at 11:04
  • @Shadow Your link is a little bit different from what i need to achieve thanks by the way. – Sam Naholo May 15 '17 at 11:13
  • @SamNaholo no, it is not. It demonstrates how you can fill a gap (create records where data does not exist) in a date range without a calendar table. This is exactly the stuff you need, since you only have a start date and an end date of a period, you need to fill in the gaps. Your question is not an exact duplicate of the other one, since you do not simply want to set the reported value to 0 in case it is missing. – Shadow May 15 '17 at 12:28

2 Answers2

1

For those that might need this.

SELECT
date_format(c.dt,'%Y-%m') AS "month",
c.dt AS date,
a.visit,
a.fu,
COUNT(DISTINCT userid) AS Active
FROM calendar_table c
LEFT JOIN visits v ON c.dt BETWEEN date_format(v.visit,'%Y-%m') AND 
DATE_SUB(date_add(v.appointment, INTERVAL 90 day), INTERVAL 
date_format((LAST_DAY(date_add(v.appointment, INTERVAL 90 day))),'%d')-1 DAY)
WHERE c.d = 1
AND c.y IN (2010,2011)
GROUP BY c.dt
Sam Naholo
  • 33
  • 4
0

It is typical for this type of requirement that you need some form of "tally table" and/or "calendar table". That is, for your column "month", you really need this is a table of some form. This enables you to represent the date range as a set of rows (one per month n this case).

Once you have the date range represented as rows, then use an outer join to visits using between in the join condition. this will then allow you to count how many users are active per time unit.

For the following example I used code from this article on a calendar table

CREATE TABLE calendar_table (
    dt DATE NOT NULL PRIMARY KEY,
    y SMALLINT NULL,
    q tinyint NULL,
    m tinyint NULL,
    d tinyint NULL,
    dw tinyint NULL,
    monthName VARCHAR(9) NULL,
    dayName VARCHAR(9) NULL,
    w tinyint NULL,
    isWeekday BINARY(1) NULL,
    isHoliday BINARY(1) NULL,
    holidayDescr VARCHAR(32) NULL,
    isPayday BINARY(1) NULL
);

CREATE TABLE ints ( i tinyint );

INSERT INTO ints VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

INSERT INTO calendar_table (dt)
SELECT DATE('2010-01-01') + INTERVAL a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i DAY
FROM ints a JOIN ints b JOIN ints c JOIN ints d JOIN ints e
WHERE (a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i) <= 11322
ORDER BY 1;

UPDATE calendar_table
SET isWeekday = CASE WHEN dayofweek(dt) IN (1,7) THEN 0 ELSE 1 END,
    isHoliday = 0,
    isPayday = 0,
    y = YEAR(dt),
    q = quarter(dt),
    m = MONTH(dt),
    d = dayofmonth(dt),
    dw = dayofweek(dt),
    monthname = monthname(dt),
    dayname = dayname(dt),
    w = week(dt),
    holidayDescr = '';

Now, with your sample data as per you question, using the following query:

select
      date_format(c.dt,'%Y-%m') as "month"
    , count(distinct userid) as active
from calendar_table c
left join visits v on c.dt between v.visit and date_add(v.appointment, INTERVAL 90 DAY)
where c.y in (2010,2011)
group by
      date_format(c.dt,'%Y-%m')

Note this approach will give you 24 rows (each month over 2 years) so you will get more rows with zeros than in your question's expected results. Just adjust the where clause to suit the desired date range.

See this operating as an example at sqlfiddle

Note: no attention has been given to indexing or performance in this example

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • Thanks so much @Used_By_Already , I've been over this for weeks now. And really needed this logic,You are almost correct , because the query for some reason doesn't count a user active if the date starts on the 6th of a month. look at user 10002 for example. this user was supposed to be active in month 07, because that's when they started. And the last day of their 90 days grace period wud've been 13-Jan-2010. so by reporting month of 2010-01 , they weren't supposed to be active bcz they didn't complete the month. same as for user 10003. So sorry if i didn't make this clear in my question. – Sam Naholo May 15 '17 at 14:30
  • You need to decide how you will measure. If you do the cakculatiins for each day (which can be done) then do you want the average active per month? Or minimum? Or maximum? Or, what day of the month would you use if not the first day? The 15th? The last day? You have choices to make. If it was me i would use a per day calculatiin averaged for the month. – Paul Maxwell May 15 '17 at 22:33
  • I altered the query, please re-test – Paul Maxwell May 16 '17 at 06:39
  • Hi @Used_By_Already Thank you once again for getting back to me. Everything else is fine. The only remaining this is that , 90 days after the appointment date. or lets say day 90. should fall on the last day of the month , for a user to be counted as active for that month. e.g. if day 90 is the 01-05-2017 or 15-05-2017 or 30-05-2017, that user is not active for May 2017. But if day 90 falls on the 31-05-2017, the user is counted as active for the month of May 2017. even in our test case, user 10001 wasnt supposed to be active for 05-2010. because 90 days from 01-03-2010 if 30-May-2010, – Sam Naholo May 16 '17 at 08:48
  • I'm sorry but I cannot continue like this. I have answered the question, plus Stackoverflow isn't a free coding service. You must now try to s olve your extra problems. Once you give this a try and are still stuck you may consider opening a new question and include the code you have used. – Paul Maxwell May 16 '17 at 11:32
  • In addition I also have a seemingly imposdible pronlem of my own to solve and Im unable to spend time on this. Sorry. – Paul Maxwell May 16 '17 at 11:40
  • I completly understand , I have managed to work out the missing bit, thanks again for pointing me in the right direction – Sam Naholo May 16 '17 at 16:17