5

I have a data set of projects. The projects change status from beginning to end, and the date of the status change is logged in a table (table is named "events" - not my choice). Would look like this (simplified):

Date        Status
2015-06-01  Start
2015-06-03  Stage 2
2015-06-07  Stage 3

In any given date range (to be determined dynamically) I want to be able to see which projects are at which status. However, using BETWEEN or other query against the data will only pull those projects whose status changed during that period, not the ones that are still at a given status.

I've currently created a very clunky solution in Excel which copies rows into new rows between status change dates, like so:

Date          Status  
2015-06-01    Project start
2015-06-02    Project start (copied)
2015-06-03    Stage 2 
2015-06-04    Stage 2 (copied)
2015-06-05    Stage 2 (copied)
2015-06-06    Stage 2 (copied)
2015-06-07    Stage 3

This solution allows me to query the status for the project on, say, 2015-06-06 and see that it is still at Stage 2.

Is there some way I can use mySql to pull this same data, but as output to a query? I've heard some suggest to use a Calendar table, but I'm not sure how that would work. I've also seen someone recommend a Cross Join, but again, I couldn't understand from the description how that would work.

Thanks in advance for your help!

2 Answers2

1

plan

  • create calendar table by cross joining digits and date_add over calendar period..
  • join your data to calendar source with date <= calendar date
  • take max of date <= calendar date
  • join back to original data source to get status

setup

drop table if exists calendar_t;
CREATE TABLE calendar_t (
  id integer primary key auto_increment not null,
  `date` date not null,
  day varchar(9) not null,
  month varchar(13) not null,
  `year` integer not null
);

drop view if exists digits_v;
create view digits_v
as
select 0 as n
union all
select 1
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
;

insert into calendar_t
( `date`, day, month, `year` )
select 
date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day) as `date`,
dayname(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as day,
monthname(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as month,
year(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as `year`
from
digits_v a2
cross join digits_v a1
cross join digits_v a0
order by date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)
;

drop table if exists example;
create table example
(
  `date` date not null,
  status varchar(23) not null
);

insert into example
( `date`, status )
values
( '2015-06-01',  'Start'   ),
( '2015-06-03',  'Stage 2' ),
( '2015-06-07',  'Stage 3' )
;

query

select cal_date, mdate, ex2.status
from
(
select cal_date, max(ex_date) as mdate
from
(
select cal.`date` as cal_date, ex.`date` as ex_date
from calendar_t cal
inner join example ex
on ex.`date` <= cal.`date`
) maxs
group by cal_date
) m2
inner join example ex2
on m2.mdate = ex2.`date`
-- pick a reasonable end date for filtering..
where cal_date <= date('2015-06-15')
order by cal_date
;

output

+------------------------+------------------------+---------+
|        cal_date        |         mdate          | status  |
+------------------------+------------------------+---------+
| June, 01 2015 00:00:00 | June, 01 2015 00:00:00 | Start   |
| June, 02 2015 00:00:00 | June, 01 2015 00:00:00 | Start   |
| June, 03 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 04 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 05 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 06 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 07 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 08 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 09 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 10 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 11 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 12 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 13 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 14 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 15 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
+------------------------+------------------------+---------+

sqlfiddle


reference

Community
  • 1
  • 1
amdixon
  • 3,814
  • 8
  • 25
  • 34
  • Thanks, amdixon. I'll give that a shot when I'm back in the office. In the meantime, I'll work on figuring out what it means :) I've never used Cross Joins before, and wondered what they were used for. I think I'm about to find out. – Rick Schultz Dec 31 '15 at 20:15
  • cross join means take all combinations of records from left and right datasets. so if you cross join a dataset with 2 records with a dataset with 5 records, you end up with a dataset with 10 records. check out [cross join](http://www.w3resource.com/sql/joins/cross-join.php). in this context its a way to generate a numeric sequence – amdixon Dec 31 '15 at 23:09
  • @RyanVincent yes you can use this as a readily available datasource ( for sequence generation ) for calendar population. efficiency is probably not a concern, as the calendar load will only happen a maximum of once a year. just make sure there are no gaps in the id sequence.. – amdixon Dec 31 '15 at 23:56
0

you dont need to create a table with all the dates. you could alter your table to give the start and end dates for each status and use a between statement.

or using your existing data.

using @datequery as the date you want to find out the status for.

Select top 1 Status from Events
where Date <= @datequery and Date 
order by Date desc

returns the most recent status change before the date you are querying.

@datequery = 2015-06-06

Status
Stage 2
pancho018
  • 587
  • 8
  • 21