Sampling out of tables depending on other variables (R)

Question

I am a physician just who just started working in R and appreciate any help in this question: i have 2 tables (A, B) with the variables age (continous), sex (binary) and test_value (binary). Each table has a different age and sex distribution.

set.seed(10)
AgeA <- round(rnorm(100, mean = 40, sd = 15))
SexA <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueA <- rbinom(100, 1, 0.3)

set.seed(20)
AgeB <- round(rnorm(1000, mean = 50, sd = 15))
SexB <- sample(c("M","F"), 1000, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueB <- rbinom(1000, 1, 0.4)

A <- data.frame(Age = AgeA, Sex = SexA, Test = Test_ValueA)
B <- data.frame(Age = AgeB, Sex = SexB, Test = Test_ValueB)

genderA<-(prop.table(table(A[,2])))
TestA<-(prop.table(table(A[,3])))
paste("median age in group A is",median(A[,1]), "percentage female in group A is",genderA[1], "percentage of test positive in A is", TestA[2])

genderB<-(prop.table(table(B[,2])))
TestB<-(prop.table(table(B[,3])))
paste("median age in group A is",median(B[,1]), "percentage female in group B is",genderB[1], "percentage of test positive in A is", TestB[2])

The difference in test-proportion is now confounded by age and sex. now i would like to match the patients from table A with table B to adjust for age and sex. because B is the smaller cohort i would prefer to sample out of A and match to B. is the match package an option? any other ideas

hopefully I was able to explain my problem. any hints to which functions this may point?

Welcome to SO. Please read [(1)](http://stackoverflow.com/help/how-to-ask) how do I ask a good question, [(2)](http://stackoverflow.com/help/mcve) How to create a MCVE as well as [(3)](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610) how to provide a minimal reproducible example in R. — Christoph, Sep 02 '16 at 19:15

score 0 · Answer 1 · answered Sep 02 '16 at 20:15

Hello i have a possible answer, I will build two populations of a 100 people with the characteristics you said

set.seed(10)
AgeA <- rnorm(100, mean = 30, sd = 10)
#population A is 0.8 percent male
SexA <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueA <- rbinom(100, 1, 0.5)

set.seed(20)
AgeB <- rnorm(100, mean = 30, sd = 10)
#population B is 0.8 percent male
SexB <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.8, 0.2))
Test_ValueB <- rbinom(100, 1, 0.3)

A <- data.frame(Age = AgeA, Sex = SexA, Test = Test_ValueA)
B <- data.frame(Age = AgeB, Sex = SexB, Test = Test_ValueB)

Then using dplyr you can summarise population B parameters:

library(dplyr)

Bsummary <- group_by(B,Sex)

Bsummary <- summarise(Bsummary, PercenteagePositive = sum(Test == 1)/length(Test == 1), PercenteageSex = n()/100)

Bsummary

If you look at the results of this B is 76% male and 24% female, if you sampled 20 people from A you would have to sample 15 males and 5 females. First you separate the population of A on males and females:

Amale <- filter(A, Sex == "M")
Afemale <- filter(A, Sex == "F")

And from that you sample 15 males and 5 females:

SampleAMale <- Amale[sample(nrow(Amale), 15), ]

SampleAFemale <-Afemale[sample(nrow(Afemale), 5), ]

Then join them and you can summarise Them

sampleA <- rbind(SampleAMale, SampleAFemale)

ASampleSummary <- group_by(sampleA,Sex)

ASampleSummary <- summarise(ASampleSummary, PercenteagePositive = sum(Test == 1)/length(Test == 1), PercenteageSex = n()/100)

this is very elegant way to do it for sex! however would like to sample not only depending on age but also on age, e.g. have exactly the same distribution of 30year old females, 23 year old males etc... any idea? — fank, Sep 02 '16 at 22:27
@fank hello, I think I could do it. However I think it would be better to use age as a covariate, of it is not to much to ask could you tell me the question you are trying to answer? I might be able to help you, I think you might not need to do that. If you want I can send you my e-mail — Derek Corcoran, Sep 02 '16 at 22:35
i am trying to compare the binary outcome the test_value between 3 groups (fishers exact test) and subsequently subgroup analysis between group A-C, A-B and B-C. B and C are my study populations with limited sample size, however A is a big cohort. anyway I could adjust for age and sex? — fank, Sep 02 '16 at 23:35

Derek Corcoran · Answer 2 · 2016-09-04T02:22:37.797

0

OK Fank I think you will like this answer a little better, the first part is the same, exept that the AGE IS ROUNDED:

set.seed(10)
AgeA <- round(rnorm(100, mean = 30, sd = 2))
#population A is 0.8 percent male
SexA <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.5, 0.5))
Test_ValueA <- rbinom(100, 1, 0.5)

set.seed(20)
AgeB <- round(rnorm(100, mean = 30, sd = 2))
#population B is 0.8 percent male
SexB <- sample(c("M","F"), 100, replace = TRUE, prob = c(0.8, 0.2))
Test_ValueB <- rbinom(100, 1, 0.3)

A <- data.frame(Age = AgeA, Sex = SexA, Test = Test_ValueA)
B <- data.frame(Age = AgeB, Sex = SexB, Test = Test_ValueB)

now you just use prop.table to get the proportions of your population. Lets say you want to sample a 1000 individuals from B in the same proportion as A in terms of AGE and SEX you do this.

1000*(prop.table(table(A[,1:2])))

then by applying filters you can sample within groups:

for example if you want to get only the males age 30 in group B you could go

BMale30 <- filter(B, Sex == "M" & Age == 30)

edited Sep 04 '16 at 02:22

answered Sep 02 '16 at 23:58

Derek Corcoran

3,930
2
25
54

hi, this is very useful and also helped me a lot to improve my initial question (see above). do you think the match package will do adjusting for age AND sex? i am a bit unsure how to use filters as final step as you proposed. thank you so much for your great help so far! – fank Sep 03 '16 at 13:27
Hello fank, you could do something like this as a filter: Amale30 <-filter(A, Sex == "M" & Age == 30) for each Age, although I would tell you that fisher exact test and Chi square test will look at difference between the same group, I am not completely sure if you need to make the groups equal. I encourage you to look at this link https://cran.r-project.org/web/packages/vcdExtra/vignettes/vcd-tutorial.pdf I think it will help you more than Match. I am here to help if you need it – Derek Corcoran Sep 04 '16 at 02:16

Sampling out of tables depending on other variables (R)

2 Answers2