0

I am trying to split a dataframe into multiple dataframes under the criteria that the data is filtered/subsetted by a shared value of the column plot. Previously, I used dplyr to subset the data based on some conditions, and select the data I would like to keep (see below). Instead of copy and pasting the same code X amount of times, I want to use a for loop to reduce the line of code.

data.p1 <- data %>% 
  filter(plot==1) %>%
  select(posX, posY, germ_bin)

data.p2 <- data %>% 
  filter(plot==2) %>%
  select(posX, posY, germ_bin)

After splitting the original dataframe data into separate dataframes (e.g data.p1), I would to apply a function such as raster. Is it possible to also include this function in the for loop?

Cameron So
  • 139
  • 11
  • 2
    Try `out <- split(subset(data, select = c(posX, posY, germ_bin), data$plot)`. Then do `lapply(out, raster)` (or use `by()` instead of `lapply(split(...` BTW: `data` and `plot` are function names in R. – markus Jan 28 '20 at 22:44
  • Possible dupe: [How to split a data frame?](https://stackoverflow.com/questions/3302356/how-to-split-a-data-frame) – markus Jan 28 '20 at 22:46
  • Based on your recommendation + the other thread, I have the following: `out <- subset(data, select = c(posX, posY, germ_bin, flwr_bin, seed_bin, plot))` followed by `out <- split(out, f=as.factor(out$plot))` and `lapply(out, rasterFromXYZ)`. Note I separated the original `split` line into 2 because subsetting the data without plot initially made it difficult to apply the `split` function. I want to apply another function called `focalWeight` to each df in the list, but I need to specify the df being applied. How would I achieve this with `lapply` or its family of functions? – Cameron So Jan 28 '20 at 23:15

2 Answers2

1

You can subset your data and apply the raster function within the same lapply. The only thing you have to be careful is that if you are loading dplyr and raster packages at the same time, you will have problems with the select function, as each package has its own select function. Probably, the best approach is to load just one package (for example dplyr) and use the :: notation to refer to functions of the other package (for example raster) like raster::raster. Here is an example applying the rasterFromXYZ to each subsetted data.

library(dplyr)
data_list<-lapply(unique(data$plot), function(i){
  raster::rasterFromXYZ(data %>%
    filter(plot==i) %>%
    select(posX, posY, germ_bin))
})
0

Something like this might be helpful for what you are trying to achieve:

library(dplyr)
data <- data.frame(
  plot = sample(4, 5, replace = T),
  posX = sample(100, 5, replace = T), 
  posY = sample(100, 5, replace = T), 
  germ_bin = sample(100, 5, replace = T)
)
#   plot posX posY germ_bin
# 1    3   55   88       74
# 2    1   72   15       34
# 3    2   54   15       24
# 4    4   39   42       13
# 5    4   83   71       95
list_of_df <- list()
for (i in 1:max(data$plot)) {
  list_of_df[[i]] <- data %>% 
    dplyr::filter(plot==i) %>%
    dplyr::select(posX, posY, germ_bin)
  # possibly another functions here
}
# [[1]]
#   posX posY germ_bin
# 1   72   15       34
# [[2]]
#   posX posY germ_bin
# 1   54   15       24
# [[3]]
#   posX posY germ_bin
# 1   55   88       74
# [[4]]
#   posX posY germ_bin
# 1   39   42       13
# 2   83   71       95
bhakyuz
  • 97
  • 4
  • Unfortunately this does not correctly select for the 3 variables `posX` `posY` and `germ_bin` and outputs a double vector instead of multiple dfs. – Cameron So Jan 28 '20 at 23:23
  • That is correct, it was because of using single brackets while assigning new data frames into list elements. I updated the example with a small example that you can take a look. – bhakyuz Jan 29 '20 at 22:42