0

I have a large dataframe that has all of the data for the project I am working on and I am trying to produce a series of smaller dataframes that contain data that match for two columns. So for example for the example data below, I need to write a piece of script that produces a series of data frames for datapoints where year and colony are the same (e.g. year = 2012, colony = A; year = 2012, colony = B).

year <- c(2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014)
colony <- c ('A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B')
measurement <- c(4,6,1,4,8,2,1,5,4,1,3,8)
data <- data.frame(year, colony, measurement) 

At the moment the best I can do is producing each individually:

A2012 <- filter(data, colony == 'A' & year == 2012)
B2012 <- filter(data, colony == 'B' & year == 2013)

etc. However, there are about 80 dataframes to produce so it would be better to automate this if possible. Does anyone know a way that I could do this quicker?

unknown
  • 853
  • 1
  • 10
  • 23
  • 3
    `myDfList <- split(df, interaction(df$colony, df$year))` will result in a list of data.frames specified as you mentioned. See gregor's answer to [this post](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) for tips on working with such objects. – lmo Jan 18 '17 at 15:47
  • @lmo amazing! Thanks so much – unknown Jan 18 '17 at 15:55
  • Nice, did not know about the interaction function. – thc Jan 18 '17 at 20:22

1 Answers1

0

You can combine functions group_by and group_split from dplyr to split the dataframe into a list of smaller dataframes:

data %>% 
  tibble() %>%
  group_by(colony,year) %>%
  group_split()

## <list_of<
##   tbl_df<
##     year       : double
##     colony     : character
##     measurement: double
##   >
## >[6]>
## [[1]]
## # A tibble: 2 x 3
##    year colony measurement
##   <dbl> <chr>        <dbl>
## 1  2012 A                4
## 2  2012 A                6
## 
## [[2]]
## # A tibble: 2 x 3
##    year colony measurement
##   <dbl> <chr>        <dbl>
## 1  2013 A                8
## 2  2013 A                2
##
## (...)

You may also want to look at the family of functions group_map, which allow you to apply a function to each group as if they were individual dataframes.