So let's say we have this df:
a = c(rep(1,5),rep(0,5),rep(1,5),rep(0,5))
b = c(rep(4,5),rep(3,5),rep(2,5),rep(1,5))
c = c(rep("w",5),rep("x",5),rep("y",5),rep("z",5))
df = data.frame(a,b,c)
df = df %>%
nest(data=c(a,b))
I want to use parameters from inside the nested "data" column to do things to the entire dataframe, for example use filter() to eliminate rows where the sum of "a" inside the nested "data" is equal to 0. Or to arrange the rows of the dataframe by the max() of b. How can I do this?
I cam up with a pretty dumb way of doing this, but I am not happy, as this isn't really applicable to the larger datasets I'm working with:
sum_column = function(df){
df = df %>%
summarize(value=sum(a))
return(df[[1]][1])
}
#so many a new column with the sum of a, and THEN filter by that
df = df %>%
mutate(sum_of_a = map(data, ~sum_column(.x))) %>%
filter(!sum_of_a==0)