-1

I'm trying to create multiple data frames within a list within another list from one original data base using two for loops.

The first iteration applies a for loop to de original data base that uses the levels of the factor as index to group data by sites, creating a sites list.

The second iteration (the one i'm having problems), I wan't it to create data frames within the sites lists that are grouped by year.

set.seed(100)

N <- sample(50, 100, replace = TRUE)

Year <- as.factor(sample(rep(2011:2020, each = 5)))

Site <- as.factor(sample(rep(c('S1', 'S2', 'S3', 'S4', 'S5'), each = 10)))

Species <- sample(rep(c('spp1', 'spp2', 'spp3', 'spp4', 'spp5'), each = 10))

DataBase <- data.frame(Year, Site, Species,  N)

Ind <- list()
Ind_year <- list ()

for (i in levels(DataBase$Site)) {
   Ind[[i]] <- DataBase %>% 
                                filter (Site == as.character(i)) %>% 
                                group_by(Year, Species) %>% 
                                count() %>% 
                                droplevels()
   
   
   for(j in levels(Ind[[(i)]]$Year)) {
      Ind_year[[j]] <-  as.data.frame(Ind[[i]] %>% 
                                        filter (Year == as.character(j)) %>%
                                        group_by(Year, Species) %>%
                                        droplevels())
   }
   
}

No error detected, but the result within the first list is this: Site 1 Site 2 Site 3 . . . Year 1 Year 2 Year 3

For example, I want the Site 1 list within the Ind list to contain the data frames of Year 1...Year n.

Any help would be appreciated.

  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 09 '21 at 00:49
  • Done. Thank you. I'm new at asking questions here. – S. Guerrero Mar 09 '21 at 01:23

2 Answers2

0

You seem to be very close to the solution - If I understood your problem correctly there are just two more lines needed and well I cleaned your code a little. One slightly unfortunate aspect is that year is a number and when using this directly instead of getting a named list entry you get a entry at list positon of the year number -> so I converted the years to text before running the loop:

set.seed(100)
library(dplyr)
# Your dummy data - we do not need factors but having the year as character is very helpfull
DataBase <- data.frame(Year = as.character(sample(rep(2011:2020, each = 5))), 
                       Site = sample(rep(c('S1', 'S2', 'S3', 'S4', 'S5'), each = 10)), 
                       Species = sample(rep(c('spp1', 'spp2', 'spp3', 'spp4', 'spp5'), each = 10)), 
                       N = sample(50, 100, replace = TRUE))

Ind <- list()
Ind_year <- list()

for (i in unique(DataBase$Site)) {
  Ind[[i]] <- DataBase %>% 
    dplyr::filter(Site == i) %>% 
    dplyr::count(Year, Species) 

  for(j in unique(Ind[[i]]$Year)) {
    Ind_year[[j]] <- Ind[[i]] %>% 
      dplyr::filter(Year == j) %>%
      dplyr::group_by(Year, Species)
  }
  # put the inner loop list where the result of the corresponding first loop resides
  Ind[[i]] <- Ind_year
  # out of precaution we set the result to nothing so that there is no risk of reusing the result from the prior site
  Ind_year <- NULL
}

Ind$S1$`2012`
# A tibble: 2 x 3
# Groups:   Year, Species [2]
  Year  Species     n
  <chr> <chr>   <int>
1 2012  spp3        2
2 2012  spp5        2

I hope this is what your where looking for?!

DPH
  • 4,244
  • 1
  • 8
  • 18
0

You can split by multiple columns :

result <- split(DataBase, list(DataBase$Site, DataBase$Year))

Or if you want a nested list you can use split with lapply :

result <- lapply(split(DataBase, DataBase$Site), function(x) split(x, x$Year))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213