randomizing subsets of dataframe and replicate function saving results for each replicate

Question

I have a dataframe (df2) like this:

locus transect  fq  d   
Locus_1 A 0.000 20
Locus_1 A 0.000 35    
Locus_1 A 0.000 50
Locus_2 A 0.200 20
Locus_2 A 0.083 35
Locus_2 A 0.125 50
Locus_3 A 0.134 20   
Locus_3 A 0.208 35
Locus_3 A 0.218 50
Locus_4 A 0.000 20
Locus_4 A 0.000 35
Locus_4 A 0.000 50
Locus_5 A 0.100 20
Locus_5 A 0.000 35
Locus_5 A 0.038 50    ...

basically each locus is sampled three times along a transect at different distances from the centre. There are thousands of loci. From this dataset, I calculate the correlation between Frequency and Distance.

The next steps are:

randomizing the position of each locus (so, first three rows, second group of three rows and so on), calculate a new correlation. Basically, I want to shuffle the d values (20-35-50) among each locus. ion
do this 1000 times
save the results for each replicate

I am trying to use mainly Plyr and dplyr.

This is the code I came up with:

df3 <- group_by(df2, transect, locus) #setting up groups to which apply functions


data <- replicate(1000, {
  test <- sample_n(df3, 3, replace=F) #shuffle by group
  Rho <- ddply(test, .(transect, locus), summarise, corr= cor(fq, d, method = "spearman")) #calculate correlation
  Rho[is.na(Rho)] <- 0 #replacing missing values with zero
  Rho_mean_bylocus <- ddply(Rho, .(locus), summarise, mean=mean(corr))  #average correlation over transect
  }, simplify = TRUE)

this is what results look like:

 [,1]        [,2]        [,3]        [,4]       
locus factor,978  factor,978  factor,978  factor,978 
mean  Numeric,978 Numeric,978 Numeric,978 Numeric,978
       [,5]        [,6]        [,7]        [,8]       
locus factor,978  factor,978  factor,978  factor,978 
mean  Numeric,978 Numeric,978 Numeric,978 Numeric,978
      [,9]        [,10]      
locus factor,978  factor,978 
mean  Numeric,978 Numeric,978

(I have 978 loci).

I tried to embed replicate() in a function

 rand.rho <- function(x) {  #I have tried also without using a function, but still does not work

  data <- replicate(1000, {
  test <- sample_n(df3, 3, replace=F) #shuffle
  Rho <- ddply(test, .(transect, locus), summarise, corr= cor(fq, d, method = "spearman")) #calculate correlation
  Rho[is.na(Rho)] <- 0 #replacing missing values with zero
  Rho_mean_bylocus <- ddply(Rho, .(locus), summarise, mean=mean(corr)) #average correlation over transect
  }, simplify = TRUE)

df4 <- rand.rho(df3)

but I get an error:

Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
Results must be all atomic, or all data frames
In addition: There were 50 or more warnings (use warnings() to see the first 50)

I am at loss.

I have looked for other answers on here already and tried to implement the suggestion but it still not working.

Any advice?

I made a small step forward (hopefully) by using do.call: `do.call(rbind, rlply(5, rand.rho(df3)))`. Now I get my list of 5 values for each locus, but they are all the same, so something in the shuffling doesn't work — IlaC, May 07 '15 at 12:59
I have to add for clarity that if I use `do.call()` then I get rid of `replicate()` — IlaC, May 07 '15 at 13:05
Can you please post sample of input and output as data.frame (....) ? — vagabond, May 07 '15 at 13:24
You are more likely to get an answer if you give sample data df2, and make a simple reproducible example as explained in hadley's answer here http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Paul Rougieux, May 07 '15 at 13:33
Thank you @Paul4forest. I will read through and try to add the info here — IlaC, May 08 '15 at 11:43
Instead of replicate(), try using group_by() and do() in the dplyr package. For example the do help page: ` mtcars %>% group_by(cyl) %>% do(head(.,2)) ` — Paul Rougieux, May 11 '15 at 08:19

randomizing subsets of dataframe and replicate function saving results for each replicate

0 Answers0