I've started working on writing functions to make table generation quicker, but want to make the function respect earlier grouping choices made by the user in the pipe.
Example data:
df<-data.frame(ID=c("A","B","C","A","C","D","A","C","E","B","C","A"),
Year=c(1,1,1,2,2,2,3,3,3,4,4,4),
Credits=c(1,3,4,5,6,7,2,1,1,6,1,2),
Major=c("GS","GS","LA","GS","GS","LA","GS","LA","LA","GS","LA","LA"),
Status=c("green","blue","green","blue","green","blue","green","blue","green","blue","green","blue"),
Group=c("Art","Music","Science","Art","Music","Science","Art","Music","Science","Art","Music","Science"))
The following is the function I'm working on, and it requires/accepts a variable to define cohorts, a credit variable, and a term variable.
table_headsfte_cohorts<-function(.data,cohortvar,credits,term){
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,Pidm)%>%
group_by(!!term,!!cohortvar,group_cols())%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,group_cols())
}
For a user that may be interested in using additional grouping variables beyond the cohort variable they choose, I am hoping that the end result function would allow usage as follows:
df2<-df%>%
group_by(Status,Group)%>%
table_headsfte_cohorts(Major,Credits,Year)
The desired end result would be a table that respects and preserves the levels of the two grouping variables in the group_by
statement above in addition to the cohortvar
and term
columns coming from the table_headsfte_cohorts()
arguments.
I need to generate this same table, but for a wide range of grouping variables, and varying numbers of grouping variables, so flexibility would be very helpful.
Edit:
The following seems to get close, by at least allowing multiple grouping variables. This isn't quite what I'm hoping for, as I'd prefer that the additional grouping arguments are read from up the pipe:
table_headsfte_cohorts<-function(.data,cohortvar,credits,term,...){
grps<-enquos(...)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,!!cohortvar,!!! grps)%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,!!!grps)
}
Using the above, I can successfully run:
fdfout<-fdf%>%
table_headsfte_cohorts(Major, Credits, Year), getting:
and I can also pass the other variables to the function to serve as additional grouping variables:
fdfout_alt<-fdf%>%
table_headsfte_cohorts(Major,Credits,Year,Status,Group)
yielding the desired result:
Unfortunately, when I use
fdf_no<-fdf%>%
group_by(Status, Group)%>%
table_headsfte_cohorts(Major, Credits, Year)
I get:
This output would likely confuse someone using my function, as their group_by()
line seems to do nothing.