0

I want to find out, if one date interval falls into another date interval. Both date intervals are consistings of two columns each (One date interval is: M_start.x and M_end.x, and another one is: M_start.y and M_end.y).

Can somebody come up with an easy solution?

I have tried just to make a subset:

subset(Dataset, M_start.x <= M_start.y >= M_end.x | M_start.x <= M_end.y >= M_end.x)

But this is obviously not working:

Error: unexpected '>=' in "Marker1and2_2 <- subset(Marker1and2_1, M_start.x <= M_start.y >="
M--
  • 25,431
  • 8
  • 61
  • 93
  • Have a look at `lubridate::int_overlaps` – Andrew Gustar Nov 11 '19 at 16:17
  • Install `dplyr` package and use `dplyr::between(M_start.y, M_start.x, M_end.x)` You cannot do multiple comparison at the same time. You need to break them if you want to stick to this. Use `&` to have both conditions at the same time. – M-- Nov 11 '19 at 16:18
  • Moreover, you should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- Nov 13 '19 at 15:16

1 Answers1

0

To get quicker answers, it's helpful and often required to have a reproducible example, such as:

set.seed(123)
ndates=5
Dataset = data.frame(
  M_start.x = as.Date(round(runif(ndates)*31), origin = "2019-01-01"),
  M_end.x = as.Date(round(runif(ndates)*31), origin = "2019-02-01"),
  M_start.y = as.Date(round(runif(ndates)*31), origin = "2019-01-15"),
  M_end.y = as.Date(round(runif(ndates)*31), origin = "2019-02-15")
)



Dataset

>     M_start.x    M_end.x  M_start.y    M_end.y
>  1 2019-01-10 2019-02-02 2019-02-14 2019-03-15
>  2 2019-01-25 2019-02-17 2019-01-29 2019-02-23
>  3 2019-01-14 2019-03-01 2019-02-05 2019-02-16
>  4 2019-01-28 2019-02-18 2019-02-02 2019-02-25
>  5 2019-01-30 2019-02-15 2019-01-18 2019-03-17


subset(Dataset, (M_start.x <= M_start.y & M_start.y >= M_end.x) | 
         (M_start.x <= M_end.y)  &  (M_end.y >= M_end.x))

>     M_start.x    M_end.x  M_start.y    M_end.y
>  1 2019-01-10 2019-02-02 2019-02-14 2019-03-15
>  2 2019-01-25 2019-02-17 2019-01-29 2019-02-23
>  4 2019-01-28 2019-02-18 2019-02-02 2019-02-25
>  5 2019-01-30 2019-02-15 2019-01-18 2019-03-17

The key is that you can't do comparisons on both sides of a variable in R such as

M_start.x <= M_start.y >= M_end.x

This would work in SAS (which I'm guessing is where you are coming from?)

Instead, use something like this:

(M_start.x <= M_start.y & M_start.y >= M_end.x)

being sure to surround by parentheses to make a single compound expression.

alex keil
  • 1,001
  • 7
  • 14
  • This inequality is not good enough to find an overlap - e.g. if one interval is a subset of the other. – Andrew Gustar Nov 11 '19 at 16:30
  • Not sure what you mean @AndrewGustar. The code above does exactly that, provided that the intervals were intended to be closed per the users original use of ">=" and "<=" – alex keil Nov 11 '19 at 16:35
  • You need `StartX <= StartY <= EndX | StartX <= EndY <= EndX` to guarantee an overlap. The OP's version is also wrong! – Andrew Gustar Nov 11 '19 at 16:52
  • OP is asking about date intervals that fall entirely within another interval. Your code selects overlap where only one point falls within the other interval, but I'm not sure where you're getting that the OP is asking for that. Perhaps you should ask OP for a clarification. – alex keil Nov 12 '19 at 00:34