I have a dataframe (df2) like this:
locus transect fq d
Locus_1 A 0.000 20
Locus_1 A 0.000 35
Locus_1 A 0.000 50
Locus_2 A 0.200 20
Locus_2 A 0.083 35
Locus_2 A 0.125 50
Locus_3 A 0.134 20
Locus_3 A 0.208 35
Locus_3 A 0.218 50
Locus_4 A 0.000 20
Locus_4 A 0.000 35
Locus_4 A 0.000 50
Locus_5 A 0.100 20
Locus_5 A 0.000 35
Locus_5 A 0.038 50 ...
basically each locus is sampled three times along a transect at different distances from the centre. There are thousands of loci. From this dataset, I calculate the correlation between Frequency and Distance.
The next steps are:
- randomizing the position of each locus (so, first three rows, second group of three rows and so on), calculate a new correlation. Basically, I want to shuffle the d values (20-35-50) among each locus. ion
- do this 1000 times
- save the results for each replicate
I am trying to use mainly Plyr
and dplyr
.
This is the code I came up with:
df3 <- group_by(df2, transect, locus) #setting up groups to which apply functions
data <- replicate(1000, {
test <- sample_n(df3, 3, replace=F) #shuffle by group
Rho <- ddply(test, .(transect, locus), summarise, corr= cor(fq, d, method = "spearman")) #calculate correlation
Rho[is.na(Rho)] <- 0 #replacing missing values with zero
Rho_mean_bylocus <- ddply(Rho, .(locus), summarise, mean=mean(corr)) #average correlation over transect
}, simplify = TRUE)
this is what results look like:
[,1] [,2] [,3] [,4]
locus factor,978 factor,978 factor,978 factor,978
mean Numeric,978 Numeric,978 Numeric,978 Numeric,978
[,5] [,6] [,7] [,8]
locus factor,978 factor,978 factor,978 factor,978
mean Numeric,978 Numeric,978 Numeric,978 Numeric,978
[,9] [,10]
locus factor,978 factor,978
mean Numeric,978 Numeric,978
(I have 978 loci).
I tried to embed replicate()
in a function
rand.rho <- function(x) { #I have tried also without using a function, but still does not work
data <- replicate(1000, {
test <- sample_n(df3, 3, replace=F) #shuffle
Rho <- ddply(test, .(transect, locus), summarise, corr= cor(fq, d, method = "spearman")) #calculate correlation
Rho[is.na(Rho)] <- 0 #replacing missing values with zero
Rho_mean_bylocus <- ddply(Rho, .(locus), summarise, mean=mean(corr)) #average correlation over transect
}, simplify = TRUE)
df4 <- rand.rho(df3)
but I get an error:
Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) :
Results must be all atomic, or all data frames
In addition: There were 50 or more warnings (use warnings() to see the first 50)
I am at loss.
I have looked for other answers on here already and tried to implement the suggestion but it still not working.
Any advice?