0

I have three conditions which all must be satisfied that determine which case a particular record should be placed into. The variables x, y and z all range from [1,10]. My input would be the lower and upper bounds for each condition and for each case. I understand that if I only had one condition I could compare the ranges directly i.e

case1: [a,b] case2: [c,d] and check a <= d and c <= b

The goal would be to define cases based on conditions and the output would tell me which cases overlap with each other so I can redefine.

However I am not sure how to extend the logic to the intersection of the conditions and code it in R. TIA

Sample code: This table provides the conditioning for the case statement

structure(list(Case = c(1, 2, 3, 4, 5, 6), x_lower = c(9, 1, 
9, 3, 3, 1), x_upper = c(10, 2, 10, 5, 6, 2), y_lower = c(9, 
1, 1, 4, 4, 1), y_upper = c(10, 2, 2, 6, 7, 2), z_lower = c(9, 
1, 1, 3, 3, 1), z_upper = c(10, 2, 2, 4, 5, 2), `Overlapping Case` = c(NA, 
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

some data to check logic, Case2 and Case6 would always overlap and I believe Case4 is always contained in Case5

library(tidyverse)
set.seed(10)
dat=data.frame(x=sample(1:10,size=1000,replace=TRUE),y=sample(1:10,size=1000,replace=TRUE),
               z=sample(1:10,size=1000,replace=TRUE)) %>% 
 mutate(Case= case_when( between(x,9,10) & between(y,9,10) & between(z,9,10)~ "Case1",
             between(x,1,2) & between(y,1,2) & between(z,1,2)~"Case2",
             between(x,9,10) & between(y,1,2) & between(z,1,2)~"Case3",
             between(x,3,5) & between(y,4,6) & between(z,3,4)~"Case4",
             between(x,3,6) & between(y,4,7) & between(z,3,5)~"Case5",
             between(x,1,2) & between(y,1,2) & between(z,1,2)~"Case6",
             TRUE ~ "Other"))
te time
  • 485
  • 3
  • 9
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Apr 16 '20 at 00:56
  • Can you clarify what the conditions are to make an overlap? I think it's: if there exists a triple (x, y, z) that simultaneously satisfies all conditions across multiple cases, that's an overlap. Did I get that right? – Aaron Montgomery Apr 16 '20 at 02:02
  • @AaronMontgomery Yes, I think that is it, for example in the sample table (3,6,9) or (2,4,10) etc. would fall into both cases 2 and 3 – te time Apr 16 '20 at 03:52
  • Without a minimal working example (ie a dataset that we can copy/paste into our own R consoles) as @MrFlick suggested, it's very difficult to help in a meaningful way. But, one tool that may be useful: you can use `&` to join together multiple logical conditions with AND, and you can use `¦` to join conditions with OR. – Aaron Montgomery Apr 16 '20 at 11:15
  • @AaronMontgomery I added some sample data hopefully it will make finding a solution easier. – te time Apr 16 '20 at 17:01

1 Answers1

0

Here's one way (of many) you could go about it. Comments interspersed below.

# given structure stored as "df"

#-----------

# Establish the overlap() function. This function takes as input
# two rows from df and outputs a logical indicator of overlap
overlap <- function(a, b){
  a <- unlist(a)   # flatten these into vectors for use in apply() later
  b <- unlist(b)
  all(a["x_lower"] <= b["x_upper"], b["x_lower"] <= a["x_upper"],
      a["y_lower"] <= b["y_upper"], b["y_lower"] <= a["y_upper"],
      a["z_lower"] <= b["z_upper"], b["z_lower"] <= a["z_upper"])
}

# testing the new function
overlap(temp[4, ], temp[5, ])
  ## [1] TRUE
overlap(temp[2, ], temp[5, ])
  ## [1] FALSE
overlap(temp[2, ], temp[6, ])
  ## [1] TRUE

#-----------

# Define the check_other_rows() function, which takes as input
# a number of a row in df and produces as output a comma-separated
# string (possibly empty) of matching rows
check_other_rows <- Vectorize(function(i){
  apply(df, 1, overlap, b = temp[i, ]) %>% 
    which %>%          # record which rows had overlap with row i
    setdiff(i) %>%     # remove trivial value of i from list
    paste(sep = ", ")
  }
)

# testing the new function
check_other_rows(1)
  ## [[1]] 
  ## character(0)
check_other_rows(2)
  ## [1] "6"

#-----------

a <- check_other_rows(1:nrow(df))
# this works because we vectorized check_other_rows(), but we need
# to clean up the structure a bit b/c of the empty entries
a[lengths(a) == 0] <- NA_character_
a <- as.character(a)
a
  ## [1] NA  "6" NA  "5" "4" "2"

# store the result into the desired column of df
df$`Overlapping Case` <- a
Aaron Montgomery
  • 1,387
  • 8
  • 11