df <- data.frame(Test=c('Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1','Test1'), Name=c('M1','M1','M1','M2','M2','M2','M2','M2','M3','M3','M3','M3','M3','M3','M3'), Test_Date=as.Date(c('10/16/2011','1/29/2012','1/29/2012','7/26/2011','7/26/2011','5/12/2012','5/12/2012','10/29/2013','9/28/2011','1/8/2012','9/16/2012','6/3/2013','7/11/2013','8/10/2013','9/13/2013'),'%m/%d/%Y') );
SPAN <- 48*7;
MINTESTS <- 2;
df$issue <- ave(as.integer(df$Test_Date),df$Name,df$Test,FUN=function(dates) apply(outer(dates,dates,`-`),1,function(diffs) if (sum(abs(diffs)<SPAN) >= MINTESTS) 'Yes' else 'No'));
df;
## Test Name Test_Date issue
## 1 Test1 M1 2011-10-16 Yes
## 2 Test1 M1 2012-01-29 Yes
## 3 Test1 M1 2012-01-29 Yes
## 4 Test1 M2 2011-07-26 Yes
## 5 Test1 M2 2011-07-26 Yes
## 6 Test1 M2 2012-05-12 Yes
## 7 Test1 M2 2012-05-12 Yes
## 8 Test1 M2 2013-10-29 No
## 9 Test1 M3 2011-09-28 Yes
## 10 Test1 M3 2012-01-08 Yes
## 11 Test1 M3 2012-09-16 Yes
## 12 Test1 M3 2013-06-03 Yes
## 13 Test1 M3 2013-07-11 Yes
## 14 Test1 M3 2013-08-10 Yes
## 15 Test1 M3 2013-09-13 Yes
Notes:
- I coerced your date strings to
Date
class using as.Date(c(...),'%m/%d/%Y')
, which is necessary to prepare for datewise arithmetic.
- As you can see, I hard-coded
SPAN
(number of days surrounding a given test date that is considered to be part of its "span") and MINTESTS
(minimum number of tests within the span to qualify the row for issue='Yes'
) as constants in the global environment.
- I had to coerce the first argument to
ave()
to integer because otherwise ave()
would automatically try to coerce the return value to Date
class, which would fail, since 'Yes'
and 'No'
are not valid date strings. This is an annoying behavior from ave()
that does not appear to be configurable. Fortunately, the input df$Test_Date
does not need to be classed as Date
the way I use it in FUN()
.
- I group by both
df$Name
and df$Test
, so each machine/test pair is treated differently, with respect to whether or not there were MINTESTS
tests during SPAN
days surrounding a particular test date for that machine/test.
FUN()
works by computing the day difference between every single pair of dates for that machine/test pair (that's what outer(dates,dates,`-`)
calculates), then, for each row in the resulting difference matrix, counts how many of those absolute differences are within SPAN
, and branches on whether that count surpasses MINTESTS
; if it does, 'Yes'
is returned; if it does not, 'No'
is returned. Thus the issue
column results from the ave()
call and can be assigned directly to df$issue
.
Here's one way you could plot this data:
## compute a key frame: one line per machine/test
pairs <- unique(df[,c('Name','Test')]);
## precompute ticks
xtick <- seq(seq(min(df$Test_Date),by='-1 month',len=2)[2],seq(max(df$Test_Date),by='1 month',len=2)[2],'month');
yspace <- 1/(nrow(pairs)+1);
pairs$ytick <- seq(yspace,1-yspace,len=nrow(pairs));
## precompute point colors using named character vector
pointColor <- c(No='red',Yes='blue');
## draw the plot
par(mar=c(6,6,3,3)+0.1,xaxs='i',yaxs='i'); ## set global plot params
plot(NA,xlim=c(min(xtick),max(xtick)),ylim=c(0,1),axes=F,xlab='',ylab=''); ## define plot bounds
with(merge(df,pairs),points(Test_Date,ytick,col=pointColor[issue],pch=4,cex=1)); ## plot points
axis(1,xtick,strftime(xtick,'%Y-%m'),las=2); ## x-axis
axis(2,c(0,pairs$ytick,1),NA,tcl=0); ## y-axis (full extent, no tick marks)
axis(2,pairs$ytick,paste0(pairs$Name,':',pairs$Test),las=1); ## y-axis (just labels and tick marks on main lines)
title('Machine Test Coverage'); ## title
