First, data beings in the format of a data frame df
. I have converted df
to a species by plot matrix mat
(figuring it will be easier to work from this format). Species are rows and plots are columns. Cells represent the frequency the species was found in that plot.
set.seed(3421)
df<-data.frame(plot= as.factor(c(rep(1,4),rep(2,4),rep(3,3),rep(4,2),
rep(5,6),rep(6,7))),
species= sample(letters[1:26], size= 26, replace=TRUE))
library("tidyverse")
df<-
df%>%
group_by(plot, species)%>%
summarize(freq= length(species))
mat<- dcast(df , species~plot, value.var = "freq", fill=0 )
mat<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16, ncol=6)
dimnames(mat)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1", "2", "3", "4", "5", "6"))
I would like to create a loop that iterates through the df to create a list of matrices for each cluster size such that a matrix for each cluster size includes multiple unique aggregates of plots. Given my example data frame, cluster sizes can range from the aggregation of 1 plot to all 6 plots combined. For example, for cluster size=1, a single plot is its own cluster, so results are simply the frequency of each species in that plot. For a cluster of size =2, a cluster is defined as the aggregation of two plots. Results will be the sum of frequencies for each species across TWO aggregated plots. Similarly, for a cluster of size=3, a cluster is defined as the aggregation of THREE plots and results are the sums of frequencies for each species across THREE aggregated plots.
For n cluster sizes, plots can be aggregated i times to achieve a cluster of that size. For example, in a cluster size of 2 we may aggregate: plot 1 & plot 2, plot 2 & plot 3 AND plot 5 & plot 10.
I wish to cluster using a moving window method. So, for a cluster size of 2, plots would be aggregated as follows: 1&2, 2&3, 3&4, 4&5.....11&12.
I imagine the way to go about this is to loop through the original data frame or matrix and output a new matrix for each cluster size. Below I provide examples of output matrices for cluster sizes 1-3 for the example data frame above.
Example output matrices for cluster size 1: Aggregates of 1 plot
mat1<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0), nrow=16, ncol=1)
dimnames(mat1)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1"))
mat2<- matrix(c(2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0), nrow=16)
dimnames(mat2)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1"))
mat3<- matrix(c(0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0), nrow=16)
dimnames(mat3)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1"))
mat4<- matrix(c(0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0), nrow=16)
dimnames(mat4)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1"))
mat5<- matrix(c(0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1), nrow=16)
dimnames(mat5)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1"))
mat6<- matrix(c(1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16)
dimnames(mat6)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1"))
Example output matrix for cluster size 2: Aggregates of 2 plots
mat7<- matrix( c(3,0,0,0,1,0,1,0,0,0,0,0,0,0,2,0,
2,0,1,1,2,0,0,0,0,0,0,0,0,0,1,0,
0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,1,0,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat7)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1_2", "2_3", "3_4", "4_5", "5_6"))
Example output matrix for cluster size 3: Aggregates of 3 plots
mat8<- matrix( c(3,0,1,1,2,0,1,0,0,0,0,0,0,0,3,0,
2,1,1,1,2,1,0,0,0,0,0,0,0,0,1,0,
0,1,1,1,2,1,0,1,0,1,1,0,0,1,0,1,
1,2,0,0,1,1,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat8)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1_2_3", "2_3_4", "3_4_5", "4_5_6"))
Note that in each matrix, rows represent species and columns are "moving window" clusters of plot aggregates for said cluster size. I have named the column headings accordingly to indicate which plots are combined to achieve that cluster size. Ideally the loop would also indicate this information. Cells are the frequency of each species for a unique aggregate of n plots. Because cluster size limits the number of possible plot aggregations, the resulting matrices will vary in dimension lengths.
All matrices can be stored in a list. I primarily need help up to this step.
mat_list<- list(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8 )
An extra step I would like to incorporate into a loop is to apply a series of functions to each matrix in the list. The result for each function can be added as a new column to the matrix. The functions I need to calculate for each cluster matrix are:
Calculate frequency for each species among all aggregates (ie. row totals ).
Calculate mean frequency for each species among all aggregates (ie. row totals/length of row )
Calculate the total area for each cluster size, here defined as the product of cluster size * pi * 25
Calculate the frequency per area. Divide mean frequency/ area
The output data frame for these three clusters will look like result_df
:
#df for cluster size 1
result_df1<- data.frame(cluster_size= rep("1", 96),
aggregate_ID= c(rep("1",16), rep("2", 16), rep("3", 16), rep("4", 16), rep("5", 16), rep("6",16)),
species= rep(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 6),
freq= c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0),
mean_freq=c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0),
area= rep(78.54, 96))
result_df1$freq_per_area<- result_df1$mean_freq/78.54
#df for cluster size 2
result_df2<- data.frame( cluster_size= rep("2",80),
aggregate_ID= c(rep("1_2",16), rep("2_3",16), rep("3_4",16), rep("4_5",16), rep("5_6",16)),
species= c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
freq=c(6,3,2,2,6,2,1,2,1,2,3,2,1,2,3,2),
mean_freq=(c(6,3,2,2,6,2,1,2,1,2,3,2,1,2,3,2)/5),
area= rep(157.08, 16))
result_df2$freq_per_area<- result_df2$mean_freq/157.08
#df for cluster size 3
result_df3<- data.frame( cluster_size= rep("3",64),
aggregate_ID= c(rep("1_2_3",16), rep("2_3_4",16), rep("3_4_5",16), rep("4_5_6",16)),
species= c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
freq=c(6,4,3,3,7,3,1,2,1,2,3,2,1,2,4,2),
mean_freq=(c(6,4,3,3,7,3,1,2,1,2,3,2,1,2,4,2)/5),
area= rep(157.08, 16))
result_df3$freq_per_area<- result_df3$mean_freq/235.62
result_df<- rbind(result_df1,result_df2,result_df3)
Note that result_df
includes results for a cluster size up to three, but for this example data frame clusters sizes would be a big as 6, so the loop would need to iterate up to the maximum cluster size.