0

I am new to loops in R and I need help with writing multiple nested loops. I have a data frame where a row represents counts of species from one site within a region. There are 50 regions, and the number of sites among regions is unequal. For each region I need to calculate a diversity index based on incrementingly increasing the number sites, and replicating this 1000x for each incremental step. For instance:

R1 <- subset(df, region=="1") #this needs to be completed for all 50 regions
R1$region<-NULL

max<-nrow(R1)-1

iter <- 1000 #the number of iterations
n <- 1 # the number of rows to be sampled. This needs to increase until 
“max” 
outp <- rep(NA, iter)

for (i in 1:iter){
  d <- sample(1:nrow(R1), size = n, replace=FALSE)
  bootdata <- R1[d,]
  x <- colSums(bootdata) #this is not applicable until n>1
  outp[i] <- 1/diversity(x, index = "simpson")
}

Here is a sample dataset

structure(list(region = c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L), Sp1 = c(31L, 
85L, 55L, 71L, 81L, 22L, 78L, 64L), Sp2 = c(10L, 84L, 32L, 86L, 
47L, 93L, 55L, 35L), Sp3 = c(86L, 56L, 4L, 8L, 55L, 47L, 51L, 
95L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L), .Names = c("region", "Sp1", "Sp2", "Sp3"), spec = structure(list(
cols = structure(list(region = structure(list(), class = 
c("collector_integer", 
"collector")), Sp1 = structure(list(), class = c("collector_integer", 
"collector")), Sp2 = structure(list(), class = c("collector_integer", 
"collector")), Sp3 = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("region", "Sp1", "Sp2", "Sp3")), 
default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

In short, for each region I need to calculate “simpson’s” index for each site, randomly resampled 1000 times. Then, I need to calculate the index again for 2 sites after each column has been summed, 1000 times. Then 3 sites etc until max.

I also struggle in writing output. I am looking to have one dataframe for each region with columns that represent 1000 iterations of n until max.

Many thanks in advance

1 Answers1

0

You can write a function that works on a generic region at a time. Then you split your data by region into a list and apply your custom function to each list element using sapply.

bootstrapByRegion <- function(R) {
  rgn <- unique(R$region)
  message(sprintf("Processing %s", rgn))
  R$region <- NULL

  nmax <- nrow(R)-1

  if (nmax == 0) stop(sprintf("Trying to work on one row. No dice. Manually exclude region %s or handle otherwise.", rgn))

  iter <- 1000 #the number of iterations
  # pre-allocate the result
  output <- matrix(NA, nrow = iter, ncol = nmax)

  for (i in 1:nmax) {
    i <- 1
    output[, i] <- replicate(iter, expr = {
      d <- sample(1:nrow(R), size = i, replace=FALSE)
      bootdata <- R[d, , drop = FALSE]
      x <- colSums(bootdata) #this is not applicable until n>1
      outp <- 1/diversity(x, index = "simpson")
      outp
    })
  }
  output
}

xy <- split(df, f = df$region)
result <- sapply(xy, FUN = bootstrapByRegion) # list element is taken as R

Since region 3 has only one row, it will not work (because of the nrow(R)-1). You can exclude these regions in a number of ways. Here's one.

result <- sapply(xy[sapply(xy, nrow) > 1], FUN = bootstrapByRegion)
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Many thanks Roman, this addresses how to break it into regions, but now I need to run the "sample" function for n to max, 1000 times. @RomanLuštrik – Jeremiah Plass-Johnson Jun 08 '17 at 16:00
  • @JeremiahPlass-Johnson see my edit. Region 3 will not work because there's only one row. You'll have to handle this somehow, either in a function or exclude the region before running the bootstrapping procedure. – Roman Luštrik Jun 08 '17 at 16:27