-1

I want to make a bunch of new variables a,b,c,d.....z to store tibble data frames. I will then rbind the new variables that store tibble data frames and export them as a csv. How do I do this faster without having to specify the new variables each time?

a<- subset(data.frame, variable1="condition1",....,) %>% group_by() %>% summarize( a=mean())
b<-subset(data.frame, variable1="condition2",....,) %>% group_by() %>% summarize( a=mean())
....

z<-subset(data.frame, variable1="condition2",....,) %>% group_by() %>% summarize( a=mean())

rbind(a,b,....,z)

There's got to be a faster way to do this. My data set is large so having it stored in memory as partitions of a,b,c,....z is causing the computer to crash. Typing the subset conditions to form the partitions repeatedly is tedious.

  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Oct 04 '19 at 17:37
  • `subset` doesn't have an argument `variable1` – akrun Oct 04 '19 at 17:37
  • `lapply` or some type of loop would be helpful here – Mike Oct 04 '19 at 18:14

2 Answers2

0

Instead of creating multiple objects in the global environemnt, rread them in a list, and bind it

library(data.table)
files <- list.files(pattern = "\\.csv", full.names = TRUE)
rbindlist(lapply(files, fread))

It would be much faster with fread than in any other option


If we are using strings to be passed onto group_by, convert the string to symbol with sym from rlang and evaluate (!!)

library(purrr)
map2_df(c("condition1", "condition2"), c("a", "b") ~ df1 %>%
                      group_by(!! rlang::sym(.x)) %>%
                      summarise(!! .y := mean(colname)))

If the 'condition1', 'condition2' etc are expressions, place it as quosure and evaluate it

map2_df(quos(condition1, condition2), c("a", "b"), ~ df1 %>%
                 filter(!! .x) %>%
                  summarise(!! .y := mean(colnames)))

Using a reproducible example

conditions <- quos(Petal.Length>1.5,Species == 'setosa',Sepal.Length > 5) 
map2(conditions, c('a', 'b', 'c'), ~ 
           iris %>% 
                filter(!! .x)  %>%
                summarise(!! .y := mean(Sepal.Length)))
#[[1]]
#         a
#1 6.124779

#[[2]]
#      b
#1 5.006

#[[3]]
#         c
#1 6.129661

It would be a 3 column dataset if we use map2_dfc

NOTE: It is not clear whether the OP meant 'condition1', 'condition2' as expressions to be passed on for filtering the rows or not.

akrun
  • 874,273
  • 37
  • 540
  • 662
0

You could do something like this using purrr package:

You may need to use NSE depends on what's your condition. You can reference Programming with dplyr

purrr::map_df(
    c("condition1","condition2",..., "conditionn"), 
    # .x for each condition
    ~ subset(your_data_frame, variable1=.x,....,) %>% group_by(some_columns) %>% summarise(a = mean(some_columns))
)

Example using iris:

library(rlang)
conditions <- c("Petal.Length>1.5","Species == 'setosa'","Sepal.Length > 5")
map(conditions, function(x){
    iris %>% 
        dplyr::filter(!!rlang::parse_expr(x)) %>%
        head()
})

Example using iris:

conditions <- c("Petal.Length>1.5","Species == 'setosa'","Sepal.Length > 5")
map(conditions, ~ iris %>% dplyr::filter(!!rlang::parse_expr(.x)) %>% nrow())
# or (!! is almost equivalent to eval or rlang::eval_tidy())
map(conditions, ~ iris %>% dplyr::filter(eval(rlang::parse_expr(.x))) %>% nrow())
[[1]]
[1] 113

[[2]]
[1] 50

[[3]]
[1] 118
yusuzech
  • 5,896
  • 1
  • 18
  • 33
  • @yifan Every new variable I plan to make is a subset of the data_frame based on a different condition. Recalling my original plan to make variables a,....,z, is condition 1 for variable a, condition 2 for variable b,,, condition n for variable z? Condition 1,.... Condition n are boolean conditions, do I separate condition 1,,,,.n by commas within the concatenation "c" –  Oct 04 '19 at 17:31
  • Yes, you can concatenate them using `c`. You you need to evaluate string as expression using `!!` and `parse_expression.` – yusuzech Oct 04 '19 at 17:54