0

I'm Trying to create new dataframes from dplyr 0.4.3 functions using R 3.2.2.

What I want to do is create some new dataframes using dplyr::filter to separate out data from one ginormous dataframe into a bunch of smaller dataframes.

For my reproducible base case bog simple example, I used this:

filter(mtcars, cyl == 4)

I know I need to assign that to a dataframe of its own, so I started with:

paste("Cylinders:", x, sep = "") <- filter(mtcars, cyl == 4))

That didn't work -- it gave me the error found here: Assignment Expands to Non-Language Object

From there, I found this: Create A Variable Name with Paste in R

(also, big ups to the authors of the above)

And that led me to this, which works:

assign(paste("gears_cars_cylinders", 4, sep = "_"), filter(mtcars, cyl == 4)) %>% 
    group_by(gear) %>% 
    summarise(number_of_cars = n())

and by "works," I mean I get a dataframe named gears_cars_cylinders_4 with all the goodies from

filter(mtcars, cyl == 4) %>% 
        group_by(gear) %>% 
        summarise(number_of_cars = n())

But ultimately, I think I need to wrap this whole thing in a function and be able to feed it the cylinder numbers from mtcars$cyl. I'm thinking something like plyr::ldply(mtcars$cyl, function_name)?

In my real-life data, I have about 70 different classes I need to split out into separate dataframes to drop into DT::datatable tabs in Shiny, which is a whole nuther mess. Anyway.

When I try this:

    function_name <- function(x){
    assign(paste("gears_cars_cylinders", x, sep = "_"), filter(mtcars, cyl == x)) %>% 
        group_by(gear) %>% 
        summarise(number_of_cars = n())
}

and then function_name(6),

I get the output of the dataframe to the screen, but not a dataframe with the name.

Am I looking right over the answer here?

Community
  • 1
  • 1
ClintWeathers
  • 576
  • 7
  • 22

1 Answers1

5

You need to assign the new data frames into the environment from which you're calling function_name(). Try something like this:

library(dplyr)

foo <- function(x) {
  assign(paste("gears_cars_cylinders", x, sep = "_"),
         envir = parent.frame(),
         value = mtcars %>% 
           filter(cyl == x) %>% 
           count(gear))
}

for(cyl in sort(unique(mtcars$cyl))) foo(cyl)
ls()
#> [1] "cyl"                    "foo"                   
#> [3] "gears_cars_cylinders_4" "gears_cars_cylinders_6"
#> [5] "gears_cars_cylinders_8"
gears_cars_cylinders_4
#> Source: local data frame [3 x 2]
#> 
#>    gear     n
#>   (dbl) (int)
#> 1     3     1
#> 2     4     8
#> 3     5     2
jennybryan
  • 2,606
  • 2
  • 18
  • 33
  • 6
    I can't help but feel this goes against everything I've been taught in R, in terms of grouping similar data in structures like `list`s. And if the point of `dplyr` is to simplify things, mashing it together with `assign` and environment manipulation seems like overkill. `gears <- by(mtcars, mtcars$cyl, FUN=function(x) data.frame(table(x$gear)) )` and then accessing like `gears[["4"]]` seems so much less error-prone. – thelatemail Nov 20 '15 at 06:47
  • Yes that would normally be my thinking too. But maybe there are circumstances where you actually need these data frames as separate objects? – jennybryan Nov 20 '15 at 07:01
  • 1
    I honestly can't think of an occasion when that would be necessary. If you have the data.frames of `cyl4` `cyl6` `cyl18` floating about in the GlobalEnv then you need to loop over a `paste`d together vector of `"cyl"` and `i in c(4,6,18)` and use `get()` to retrieve them, when you could just do `gears[[i]]` – thelatemail Nov 20 '15 at 07:18