I would like to re-create this Stata command in R
by Area Sex Age: keep if (Infected==1) | ((_n<=1*ncases) & (Infected==0))
This is for a matched case control study
My dataframe contains 193 cases and a variable number of controls per group (Area Sex and Age). I am trying to match 1 random control to each case based on a grouping of Area Sex and Age.
ncases is an integer in my dataframe signifying the number of cases in each group (Area Sex Age)
The command line above works fine in Stata.
But, the R code I have written only works for the first group:
dat5 <- subset(dat4,by=list(Area,Sex,Age),(Infected=1 |
((seq(dim(dat4)[1]))<=1*ncases & Infected==0)))
This is my dataframe dat4: Infected=1 is a case, infected=0 is a control.
Area Sex Age CensusNo Animals Infected ncases
18825 1 1 23 1023224 0 0 1
18826 1 1 23 1024109 1 0 1
18827 1 1 23 1024163 0 1 1
41428 7 2 50 1047107 1 0 1
41429 7 2 50 1047029 1 0 1
41430 7 2 50 1046901 1 1 1
41439 5 1 36 1047037 1 0 2
41440 5 1 36 1047127 1 0 2
41441 5 1 36 1047125 1 0 2
41442 5 1 36 1047005 1 0 2
41443 5 1 36 1046994 0 1 2
41444 5 1 36 1046972 0 1 2