I want to match 2 controls
for every case
with two conditions:
the
age
difference should between ±2;the
income
difference should between ±2.
If there are more than 2 controls
for a case, I just need select 2 controls
randomly.
There is an example:
EXAMPLE
DATA
dat = structure(list(id = c(1, 2, 3, 4, 111, 222, 333, 444, 555, 666,
777, 888, 999, 1000),
age = c(10, 20, 44, 11, 12, 11, 8, 12, 11, 22, 21, 18, 21, 18),
income = c(35, 72, 11, 35, 37, 36, 33, 70, 34, 74, 70, 44, 76, 70),
group = c("case", "case", "case", "case", "control", "control",
"control", "control", "control", "control", "control",
"control", "control", "control")),
row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
> dat
# A tibble: 14 x 4
id age income group
<dbl> <dbl> <dbl> <chr>
1 1 10 35 case
2 2 20 72 case
3 3 44 11 case
4 4 11 35 case
5 111 12 37 control
6 222 11 36 control
7 333 8 33 control
8 444 12 70 control
9 555 11 34 control
10 666 22 74 control
11 777 21 70 control
12 888 18 44 control
13 999 21 76 control
14 1000 18 70 control
EXPECT OUTCOME
For id = 1
, the matched controls as below, and I just need select 2 controls
randomly in the table below.
id | age | income | group |
---|---|---|---|
111 | 12 | 37 | control |
222 | 11 | 36 | control |
333 | 8 | 33 | control |
555 | 11 | 34 | control |
For id = 2
,the matched controls as below, and I just need select 2 controls
randomly in the table below.
id | age | income | group |
---|---|---|---|
666 | 22 | 74 | control |
777 | 21 | 70 | control |
1000 | 18 | 70 | control |
For id = 3
,there is no matched controls
in dat
.
For id = 4
, the matched controls as below, and I just need select 2 controls
randomly in the table below.
One thing to note here is that we can find that the controls for
id = 1
andid = 4
have overlapping parts. I don't want twocases
to share acontrol
, what I need is that ifid = 1
choosesid = 111
andid = 222
ascontrol
, thenid = 4
can only chooseid = 555
ascontrol
, and ifid = 1
choosesid = 111
andid = 333
as control, thenid = 4
can only chooseid = 222
andid = 555
as controls.
id | age | income | group |
---|---|---|---|
111 | 12 | 37 | control |
222 | 11 | 36 | control |
555 | 11 | 34 | control |
The final output maybe like this(the id
in control
group is randomly selected from the id
that meets the conditions):
id | age | income | group |
---|---|---|---|
1 | 10 | 35 | case |
2 | 20 | 72 | case |
3 | 44 | 11 | case |
4 | 11 | 35 | case |
111 | 12 | 37 | control |
222 | 11 | 36 | control |
333 | 8 | 33 | control |
555 | 11 | 34 | control |
777 | 21 | 70 | control |
1000 | 18 | 70 | control |
NOTE
I've looked up some websites, but they don't meet my needs. I don't know how to implement my requirements using R code.
Any help will be highly appreciated!
Reference:
1.https://stackoverflow.com/questions/56026700/is-there-any-package-for-case-control-matching-individual-1n-matching-in-r-n
2.Case control matching in R (or spss), based on age, sex and ethnicity?
3.Matching case-controls in R using the ccoptimalmatch package