10

Using the lubridate library, I can find out if two time periods overlapped. But, is there an efficient way to compute for how many days they overlapped. (for instance how many days a women smoked while pregnant. The pregnancy period and smoking period may overlap totally, partially or not at all)

Here is an example with three women:

preg_start<-as.Date(c("2011-01-01","2012-01-01","2013-01-01"))
preg_end<-preg_start+270 # end after 9 months
smoke_start<-as.Date(c("2011-02-01","2012-08-01","2014-01-01"))
smoke_end<-smoke_start+100 # all three smoked 100 days

data<-data.frame(cbind(preg_start,preg_end,smoke_start,smoke_end))

I want to add a variable saying that the first woman smoked 100 days during pregnancy, the second smoked 30 days and the third did not smoke while pregnant.

lovalery
  • 4,524
  • 3
  • 14
  • 28
  • If you provide some sample data like described here http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example this will make it easier to help you. It is also a good idea to include your desired output in your question – talat May 12 '14 at 11:10
  • 1
    This keeps coming up in my search for overlapping time periods. Should the title be changed to something like "R Time period overlap in days"? – ARobertson Dec 12 '17 at 01:23

1 Answers1

13

Use interval to create time intervals for pregnancy and smoking. Then calculate the intersect of these intervals. From that you can calculate the period in days.

library("lubridate")
preg_start<-as.Date(c("2011-01-01","2012-01-01","2013-01-01"))
preg_end<-preg_start+270 # end after 9 months
smoke_start<-as.Date(c("2011-02-01","2012-08-01","2014-01-01"))
smoke_end<-smoke_start+100 # all three smoked 100 days

smoke <- new_interval(smoke_start, smoke_end, tzone="UTC")
preg <- new_interval(preg_start, preg_end, tzone="UTC")
day(as.period(intersect(smoke, preg), "days"))

I get 100, 57 and 0 days of smoking during pregnancy.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
nnn
  • 4,985
  • 4
  • 24
  • 34
  • Shouldn't it be 101, 58, and 1? It isn't counting the first day of overlap (which could be crucial in cases where the number of days of overlap is, say, less than 10). – AmagicalFishy Jun 10 '16 at 19:25
  • @AmagicalFishy That is a problem. However, as this method (using interval, as new_interval is now deprecated) will return NA when the time periods don't overlap at all, it's easy enough to add 1 to the whole vector to correct it. – plagueheart Jul 19 '19 at 21:59