I want to match 2 controls
for every case
with two conditions:
the
age
difference should between ±2;the
income
difference should between ±2.
If there are more than 2 controls
for a case
, I just need to select 2 controls
randomly. And then, how do I generate a new variable that indicates the control that each case
matches? For example, Control1
and Control2
matched by Case1
are encoded as group 1
, and Control1
and Control2
matched by Case2
are encoded as group 2.
DATA
dat = structure(list(id = c(1, 2, 3, 4, 111, 222, 333, 444, 555, 666,
777, 888, 999, 1000),
age = c(10, 20, 44, 11, 12, 11, 8, 12, 11, 22, 21, 18, 21, 18),
income = c(35, 72, 11, 35, 37, 36, 33, 70, 34, 74, 70, 44, 76, 70),
group = c("case", "case", "case", "case", "control", "control",
"control", "control", "control", "control", "control",
"control", "control", "control")),
row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
EXPECTED OUTPUT
id | age | income | group | index |
---|---|---|---|---|
1 | 10 | 35 | case | 1 |
2 | 20 | 72 | case | 2 |
3 | 44 | 11 | case | 3 |
4 | 11 | 35 | case | 4 |
111 | 12 | 37 | control | 1 |
222 | 11 | 36 | control | 1 |
333 | 8 | 33 | control | 4 |
555 | 11 | 34 | control | 4 |
777 | 21 | 70 | control | 2 |
1000 | 18 | 70 | control | 2 |
This is similar to my previous question, but I want the output to have an extra variable called index
to indicate the specific controls for case matching. If a case
and a control
have the same index
, it means that specific controls is matched with that case.
The question is how can I create the index
, preferably with an approach based on the previous question.