0

I am trying to write a loop where I can subset the dataframe by year and store this in a new dataframe.

df1<-data.frame(ID=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
                year=c("2021","2021","2022","2023","2021","2021","2022","2023","2021","2021","2022","2023",
                       "2021","2021","2022","2023","2021","2021","2022","2023"),
                x=c(2,4,5,9,9,7,5,3,2,4,5,9,9,7,5,3,6,8,3,4))

I know I can do this using the subset() function easily, however eventually I will be working with a large dataset with 30+ years so I am assuming there is an easier way to do this than subsetting for each year (lets say in a dataset that contains data from 1990-2020). Basically I would like the results to be similar to:

d2021<-subset(df1,year=="2021")
d2022<-subset(df1,year=="2022")
d2023<-subset(df1,year=="2023")

but using a loop so I do not have to type the above out for each of the 30 years in my actual dataset. I have tried the following based on something I found online but it is not working:

for(i in unique(df1$year)){
  if(any(variable.names(df1)==i)){
    assign(i,df1[,c(i)])
  }
}

which gives me the output

> i
[1] "2023"

I need to subset and store the subsetted data for each year as I will be doing further analysis (MCPs and RSF functions) where data will have to be split by year, and I will need to call the dataframes for each year to run different types of analyses on different years.

I also know the split() function will split my data by year, however this results in a list for each year, not new dataframes.

x<-split(df1,df1$year)
str(x)
List of 3
 $ 2021:'data.frame':   10 obs. of  3 variables:
  ..$ ID  : num [1:10] 1 1 2 2 3 3 4 4 5 5
  ..$ year: chr [1:10] "2021" "2021" "2021" "2021" ...
  ..$ x   : num [1:10] 2 4 9 7 2 4 9 7 6 8
 $ 2022:'data.frame':   5 obs. of  3 variables:
  ..$ ID  : num [1:5] 1 2 3 4 5
  ..$ year: chr [1:5] "2022" "2022" "2022" "2022" ...
  ..$ x   : num [1:5] 5 5 5 5 3
 $ 2023:'data.frame':   5 obs. of  3 variables:
  ..$ ID  : num [1:5] 1 2 3 4 5
  ..$ year: chr [1:5] "2023" "2023" "2023" "2023" ...
  ..$ x   : num [1:5] 9 3 9 3 4

  • 1
    `split(df1, ~ year)` will return a list of data frames split by year which you can map over for your subsequent analyses. – Ritchie Sacramento Apr 05 '22 at 00:45
  • Hi @RitchieSacramento I have tried the split function but I would either still have to assign each dataframe individually after splitting it seems. I am hoping to have a unique dataframe for each year that I can store and use later, instead of needing to call the level or assign each level to its own dataframe. I am not sure if that makes sense or I am misunderstanding something. – Margaret Hughes Apr 05 '22 at 01:05
  • Assigning all the data frames back to their own objects in the environment, rather than keeping them as a list of data frames, is often considered an anti-pattern—usually it's a sign that a data management strategy just hasn't been thought through well. I'm adding a relevant link up top now that you've added clarification, but voting to keep this closed – camille Apr 06 '22 at 22:50

0 Answers0