using intervals to assign categorical values

Question

Take the following generic data

A <- c(5,7,11,10,23,30,24,6)
B <- c(1,2,3,1,2,3,1,2)
C <- data.frame(A,B)

and the following intervals

library(intervals)
interval1 <- Intervals(
  matrix(
    c(
      5, 15,
      15, 25,
      25, 35,
      35, 100
    ),
    ncol = 2, byrow = TRUE
  ),
  closed = c( TRUE, FALSE ),
  type = "Z"
)
rownames(interval1) <- c("A","B","C", "D")

interval2 <- Intervals(
  matrix(
    c(
      0, 10,
      12, 20,
      22, 30,
      30, 100
    ),
    ncol = 2, byrow = TRUE
  ),
  closed = c( TRUE, FALSE ),
  type = "Z"
)
rownames(interval2) <- c("P","Q","R", "S")

Now I want to create the following output table

enter image description here

So where the A value overlap the two invervals, I want to 'copy' all the data to a line below. We also introduce data$X which is the interval1 value and data$y which is the interval2 value. Where data does not fit within any of the interval, I want to remove it from the data.frame

I am not sure if the break() function would be better used to create the intervals or if the dplyr function can be used to make the reoccuring data rows

I do not understand. Sorry but your explanation is not clear enough. Can you elaborate how you obtain the 4 first lines of your final `data.frame`? — Colonel Beauvel, May 18 '15 at 12:00
I hope this will clarify.... the value of 5 appears in the interval1 as 'A' and interval2 'P'... the value of 7 appears in the interval1 as 'A' and interval2 'P'... the value of 11 appears in the interval1 as 'A' but not within any interval2 bounds — lukeg, May 18 '15 at 12:04

BrodieG · Accepted Answer · 2015-05-18T13:43:08.820

5

You can use foverlaps in data.table:

library(data.table)
C.DT <- data.table(C)
C.DT[, A1:=A] # required for `foverlaps` so we can do a range search

# `D` and `E` are your interval matrices

I1 <- data.table(cbind(data.frame(D), idX=LETTERS[1:4], idY=NA))
I2 <- data.table(cbind(data.frame(E), idX=NA, idY=LETTERS[16:19]))

setkey(I1, X1, X2)  # set the keys on our interval ranges
setkey(I2, X1, X2)

rbind(
  foverlaps(C.DT, I1, by.x=c("A", "A1"), nomatch=0), # match every value in `C.DT$A` to the ranges in `I1` 
  foverlaps(C.DT, I2, by.x=c("A", "A1"), nomatch=0)
)[order(A, B), .(A, B, X=idX, Y=idY)]

Produces:

     A B  X  Y
 1:  5 1  A NA
 2:  5 1 NA  P
 3:  6 2  A NA
 4:  6 2 NA  P
 5:  7 2  A NA
 6:  7 2 NA  P
 7: 10 1  A NA
 8: 10 1 NA  P
 9: 11 3  A NA
10: 23 2  B NA
11: 23 2 NA  R
12: 24 1  B NA
13: 24 1 NA  R
14: 30 3  C NA
15: 30 3 NA  R
16: 30 3 NA  S

Note you can easily change what you get instead of NA, by modifying the steps where I1 and I2 are created.

edited May 18 '15 at 13:43

answered May 18 '15 at 13:11

BrodieG

51,669
9
93
146

Thanks, that works great, can you please explain the setkey() function – lukeg May 18 '15 at 13:31
1

@lukeg That is a `data.table` function that orders your table by the columns selected, which then allows `data.table` to search through those columns knowing they are ordered (this allows for fast searches). – BrodieG May 18 '15 at 13:38
1

@lukeg, you should post a new question that captures the additional complexity while trying to keep the problem as simple as possible. – BrodieG May 18 '15 at 17:49

using intervals to assign categorical values

1 Answers1

Linked