0

I need to subset a data frame into smaller data frames by age. I want to write a function for this but do not know how to avoid writing each variable name.

Below is how I have accomplished this in the past:

CompletedStudy <- StudyCompletion %>%
subset(complete==1)

CompletedStudy_Under24months <- CompletedStudy %>%
subset(child_age<24)

CompletedStudy_Over24months <- CompletedStudy %>%
subset(child_age>=24)

I want to create something like the below function:

CompletedStudy <- StudyCompletion %>%
subset(complete==1)

AgeCompletion <- function(x) {

x+"_Under24Months" <- x %>%
subset(child_age<24)

x+"_Over24Months" <- x %>%
subset(child_age>=24)
}

AgeCompletion(CompletedStudy)

Is this possible?

  • `age_complete_list = split(CompletedStudy, CompletedStudy$child_age < 24)` should do it (giving you the resulting data frames in a `list()`. If you had more than 2 age categories, using `cut()` on the age column would be a good idea. It's generally nicer to have similar objects in a `list` rather than have a bunch of separate objects with variables as part of the name. See my answer at [How to make a list of data frames](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) for some discussion and examples. – Gregor Thomas Sep 08 '20 at 15:33
  • Is there any way to automate writing this out any further? I have to divide this into 10 age ranges and into 10 different components of study completion. So I would have to write that code 100 times. If I could create a function that could be repeated that would be ideal. I have projects like this all the time, so I am wondering if there are any other short cuts to this kind of code. – Debby Zemlock Sep 08 '20 at 23:04
  • You use `cut`, like I said. [Here's a FAQ on it](https://stackoverflow.com/a/5570360/903061). So you might have `completed_by_age = split(CompletedStudy, cut(CompletedStudy$child_age, breaks = c(0, 24, 30, 44, 60, 75, 150)))`, which will create separate data frames for ages 0-24, 24-30, 30-44, 44-60, etc. I think the real question is *why* you're doing this. With `dplyr` or `data.table` for grouped operations, there's in most cases you're better off leaving your data in a single data frame... – Gregor Thomas Sep 11 '20 at 18:27

0 Answers0