0

I have one dataframe in r and i subsetted into two:

p<-c(3.14,3.56,7.45,8.33,5.44,3.12,3.78,7.62,9.12,4.34,6.78,8.65,6.99)
n<-c("mQTL","mQTL","null","null","null","null","null","null","null","null","null","null","null")
s<-c(2,2,1,2,1,1,2,2,2,1,2,1,2)
g<-c("female","male","female","male","female","female","male","female","female","male","female","female","female")
df<-data.frame(n,g,s,p)
df


mQTL<-subset(df,df$n=='mQTL')

mQTL

   n      g   s   p
1 mQTL female 2 3.14
2 mQTL   male 2 3.56


null<-subset(df,df$n=="null")

null

  n      g     s    p
3  null female 1 7.45
4  null   male 2 8.33
5  null female 1 5.44
6  null female 1 3.12
7  null   male 2 3.78
8  null female 2 7.62
9  null female 2 9.12
10 null   male 1 4.34
11 null female 2 6.78
12 null female 1 8.65
13 null female 2 6.99

I want to randomly search two elements from null, where each of them matches the two mQTLs based on gender (df$g) and number (df$s)

for example, I want to have something like this for the first random draw

 n   g      s   p
null female  2  7.62
null  male   2  3.78

for the second random draw

  n   g      s   p
null female  2  9.12
null  male   2  3.78

i want to randomly draw this 5 times, for example, to get 5 different combinations

i tried

null[which((mQTL$g==null$g)& (mQTL$s==null$s)),]

but it gave me a dataframe of all of them not two per combination

      n      g s    p
4  null   male 2 8.33
9  null female 2 9.12
11 null female 2 6.78
13 null female 2 6.99
dizue
  • 59
  • 2
  • 8
  • 1
    i don't understand. Why would 8.33 be used for the male row – kgui Mar 03 '17 at 22:14
  • i made up some data, you don't need to interpret the actual value. my actual dataframe is much larger than this. in fact, I have 4000 mQTLs to sample from null (10000 rows). i want each one of them has same feature based on 'gender' and 'number' (s column). but i want to randomly select 4000 from null, they just need to have the same feature (criteria)! – dizue Mar 03 '17 at 22:27
  • 1
    You might want to read http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 It's kind of a pain to reproduce your example as you have it here. – Frank Mar 03 '17 at 22:38
  • hi sorry, i provided the r code now! i am new to this forum, thx for the link! – dizue Mar 03 '17 at 22:51

2 Answers2

0

Try using the merge() function:

merge(mQTL, null, by.x = c("g","s"), by.y = c("g","s)) 

but you might want to rename the columns to make things clearier.

kgui
  • 4,015
  • 5
  • 41
  • 53
0
mQTL = subset(df,df$n=='mQTL')
null = subset(df,df$n=='null')

# Check if the combination of null$g and null$s matches with that of mQTL$g and mQTL$s
null$match = paste(null$g, null$s) %in% paste(mQTL$g, mQTL$s)

# Random sample of two of the matched rows
null[sample(which(null$match), 2),]

# > null[sample(which(null$match), 2),]
#       n      g s    p match
# 13 null female 2 6.99  TRUE
# 4  null   male 2 8.33  TRUE

To draw 5 times, you run a for loop and store draws in a list:

draws = list()
for(ii in 1:5){
  draws[[ii]] = null[sample(which(null$match), 2),]
}

# > draws
# [[1]]
#       n      g s    p match
# 4  null   male 2 8.33  TRUE
# 13 null female 2 6.99  TRUE
# 
# [[2]]
#       n      g s    p match
# 11 null female 2 6.78  TRUE
# 9  null female 2 9.12  TRUE
# 
# [[3]]
#       n     g s    p match
# 9 null female 2 9.12  TRUE
# 8 null female 2 7.62  TRUE
# 
# [[4]]
#       n      g s    p match
# 13 null female 2 6.99  TRUE
# 4  null   male 2 8.33  TRUE
# 
# [[5]]
#       n     g s    p match
# 7 null   male 2 3.78  TRUE
# 8 null female 2 7.62  TRUE
acylam
  • 18,231
  • 5
  • 36
  • 45