For example, suppose that you had a function that applied some DPLYR functions, but you couldn't expect datasets passed to this function to have the same column names.
For a simplified example of what I mean, say you had a data frame, arizona.trees
:
arizona.trees
group arizona.redwoods arizona.oaks
A 23 11
A 24 12
B 9 8
B 10 7
C 88 22
and another very similar data frame, california.trees
:
california.trees
group california.redwoods california.oaks
A 25 50
A 11 33
B 90 5
B 77 3
C 90 35
And you wanted to implement a function that returns the mean for the given groups (A, B, ... Z) for a given type of tree that would work for both of these data frames.
foo <- function(dataset, group1, group2, tree.type) {
column.name <- colnames(dataset[2])
result <- filter(dataset, group %in% c(group1, group2) %>%
select(group, contains(tree.type)) %>%
group_by(group) %>%
summarize("mean" = mean(column.name))
return(result)
}
A desired output for a call of foo(california.trees, A, B, redwoods)
would be:
result
mean
A 18
B 83.5
For some reason, doing something like the implementation of foo()
just doesn't seem to work. This is likely due to some error with the data frame indexing - the function seems to think I am attempting to get the mean of the column.name
string, rather than retrieving the column and passing the column to mean()
. I'm not sure how to avoid this. There's the issue of the implicit passing of the modified dataframe that can't be directly referenced with the pipe operator that may be causing the issue.
Why is this? Is there some alternative implementation that would work?