R: How to case-control match by age (1:10)

Question

I am new to R and I am struggling to find a code to age-match 10 controls per case. All the cases and controls are in one data frame and are assigned 'Case' or 'Control' in a 'Group' column. I want to make a new data frame with age-matched cases and controls that are +/- 2years of the age of the cases. My dataframe has 112 cases and 4910 controls which should be enough. This is a small proportion of my dataframe, please let me know if the sample data is too small:

structure(list(Sex = c("F", "M", "F", "M", "M", "M", "F", "F", 
"F", "F", "M", "M", "M", "M", "F", "F", "M", "M", "F", "F", "F", 
"F", "M", "F", "F", "F", "F", "M", "M", "M", "M", "M", "M", "F", 
"F", "M", "F", "F", "M", "F", "F", "M", "M", "M", "F", "M", "F", 
"F", "M", "F", "M", "M", "M", "M", "F", "F", "M", "F", "M", "F", 
"M", "M", "F", "F", "F", "M", "F", "F", "F", "M", "M", "F", "M", 
"M", "M", "F", "F", "F", "M", "M", "F", "M", "M", "F", "F", "M", 
"F", "M", "M", "F", "F", "F", "M", "M", "M", "F", "F", "F", "M", 
"M", "F", "F", "F", "M", "F", "F", "M", "M", "M", "F", "F", "F", 
"M", "F"), mcv = c(89, 90, 86, 87, 90, 88, 85, 90, 92, 89, 87, 
95, 92, 94, 89, 87, 93, 90, 96, 94, 88, 101, 83, 97, 79, 91, 
92, 89, 90, 93, 88, 94, 92, 89, 97, 98, 80, 92, 87, 95, 85, 91, 
89, 89, 94, 77, 92, 92, 82, 92, 85, 105, 96, 102, 89, 87, 87, 
95, 93, 88, 93, 82, 88, 86, 87, 88, 89, 89, 91, 90, 90, 85, 95, 
88, 91, 88, 87, 92, 91, 92, 92, 80, 80, 96, 85, 90, 88, 89, 86, 
91, 91, 76, 94, 86, 94, 84, 88, 92, 101, 91, 93, 98, 98, 91, 
86, 84, 91, 90, 88, 88, 83, 91, NA, 101), Age = c(52, 63, 72, 
52, 66, 59, 51, 63, 68, 53, 64, 70, 70, 78, 59, 55, 54, 54, 83, 
61, 51, 72, 57, 67, 72, 52, 55, 52, 95, 79, 60, 61, 73, 69, 65, 
55, 53, 77, 79, 54, 64, 54, 65, 71, 63, 52, 54, 63, 69, 70, 56, 
80, 54, 67, 59, 71, 56, 73, 53, 61, 71, 73, 74, 63, 82, 60, 52, 
65, 75, 66, 74, 71, 58, 52, 53, 55, 91, 73, 62, 51, 74, 73, 64, 
60, 58, 63, 63, 59, 72, 52, 85, 51, 61, 56, 60, 64, 73, 78, 57, 
52, 62, 64, 70, 62, 58, 69, 84, 72, 71, 63, 73, 63.3, 62.3, 59.56
), Group = c("Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Case", "Case", "Case"
)), row.names = c(NA, -114L), class = c("tbl_df", "tbl", "data.frame"
))

I have tried codes from other questions but they don't work:

Matching controls to cases using multiple conditions in r

library(dplyr, warn.conflicts = F)

dat %>%
  split(.$group) %>%
  list2env(envir = .GlobalEnv)

control$FILTER <- FALSE
control

set.seed(123)

for(i in seq_len(nrow(case))){
  x <- which(between(control$age, case$age[i] -2, case$age[i] +2) & 
               !control$FILTER)
  control$FILTER[sample(x, min(10, length(x)))] <- TRUE
}

control

bind_rows(case, control) %>% filter(FILTER | is.na(FILTER)) %>% select(-FILTER)

The product of this code above had 30 controls missing.

case_data <- dat %>% filter(group == 'case')
control_data <- dat %>% filter(group == 'control')

case_data %>%
  group_split(row_number(), .keep = FALSE) %>%
  map_df(~bind_rows(.x, control_data %>% 
                    filter(between(age, .x$age - 2, .x$age + 2)) %>%
        slice_sample(n = 10)))

The product of this code above was an error:

Error in `slice_sample()`:
! Problem while computing indices.
Caused by error in `sample.int()`:
! invalid first argument

My expected outcome is:

mcv	Age	Group
100	62	Case
99	61	Control
98	63	Control
101	60	Control
87	64	Control
98	62	Control
95	62	Control
99	63	Control
97	60	Control
90	63	Control
102	64	Control
98	70	Case
90	69	Control
98	70	Control
99	71	Control
100	71	Control
98	72	Control
96	68	Control
109	68	Control
98	69	Control
90	70	Control
100	70	Control

So on...

Does anyone know another code or know why these don't work? I appreciate any help.

Hi - Welcome to SO! In order to better help can you provide a minimal working example of your data? You can use the dput() function in R to get started. Currently, we do not know what the `dat` object is so are unable to help. Providing a data example will yield a higher probability of your question being answered promptly. Example of dput: https://stackoverflow.com/questions/49994249/example-of-using-dput. Also a link on reproducible examples: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — bs93, Jan 15 '23 at 03:37
I'm assuming the data is the sample in the linked question. I can tell you that when you remove `slice_sample()` you will see why you get that error. If there aren't 10 observations within 2 years it will just error out. The third case has no controls associated within 2 years, for example. (The age is 44.) Additionally, there is very little sample data to see what other issues could arise here. — Kat, Jan 15 '23 at 18:40
Hello, thank you for the comments. I edited the original post, please let me know if the data example is wrong. — Halloumi, Jan 16 '23 at 03:27

R: How to case-control match by age (1:10)

0 Answers0