1

I have two dataframes and a function, which works when I use it on a single variable.

library(tidyverse)

iris1<-iris
iris2<-iris

iris_fn<-function(df,species_type){
  df1<-df%>%
    filter((Species==species_type))
  return(df1)}

new_df<-iris_fn(df=iris1, species_type="setosa")

I want to pass a vector of variables to the function with the expected output being a list of dataframes (3), one filtered to each variable, for which I have been experimenting using lapply:

variables<-c("setosa","versicolor","virginica")

new_df<-lapply(df=iris1, species_type="setosa", FUN= iris_fn)

The error message is Error in is.vector(X) : argument "X" is missing, with no default which I dont understand because I have stated the variables of the function and what the name of the function is.

Can anyone suggest a solution to get the desired output? I essentially need a version of lapply or purrr function that will allow a dataframe and a vector as inputs.

Basil
  • 747
  • 7
  • 18
  • 3
    I keep seeing functions written in the style of your `iris_fn` function, but this is a really odd style of writing a function in R. Is this style being taught somewhere where it’s being copied from? Usually one would avoid having temporary variables, assignments and `return` function calls, since all of these elements are unnecessary here and just create clutter. The following is shorter, simpler, cleaner and more in the spirit of R: `iris_fn <- function (df, species_type) { df %>% filter(Species == species_type) }` – Konrad Rudolph Nov 14 '22 at 16:18
  • Not all piplelines are simple tidyverse ones as above. Some more complicated functions require computations within the function before the end result is made. Therefore at the end of this you need to return the object that is the output of the function. – Basil Dec 28 '22 at 12:12
  • 1
    Sure, of course. But even then, [using `return()` serves no purpose](https://stackoverflow.com/a/59090751/1968). – Konrad Rudolph Dec 28 '22 at 15:09

2 Answers2

0

From ?lapply : lapply(X, FUN, ...) , by naming all your arguments there's no X that could be passed to function as the first arg.

Try something like this:

library(dplyr)
iris1<-iris

# note the changes arg. order
iris_fn<-function(species_type, df){
  df1<-df%>%
    filter((Species==species_type))
  return(df1)}

variables<-c("setosa","versicolor","virginica")

new_df_list <-lapply(variables, iris_fn, df=iris1 )

Or with just an anonymous function:

new_df_list <-lapply(variables, \(x) filter(iris1, Species == x))

As you already use Tidyverse, perhaps with purrr::map() instead:

library(purrr)
new_df_list <- map(variables,  ~ filter(iris1, Species == .x))

Created on 2022-11-14 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20
0

lapply expects an argument called X as the main input. You could re-write it so that the function expects X instead of species_type e.g.

iris_fn <- function(df, X){
  df1 <- df %>% filter((Species==X))
  return(df1)
}
variables <- c("setosa", "versicolor", "virginica")

new_df <- lapply(X=variables, FUN=iris_fn, df=iris1)

EDIT: Alternatively to avoid using X, you need the first argument of the function to match the lapply input e.g.

iris_fn <- function(species_type, df){
  df1 <- df %>% filter((Species==species_type))
  return(df1)
}
new_df <- lapply(variables, FUN=iris_fn, df=iris1)

Check out the split function for a convenient way to split a data.frame to a list e.g. split(iris, f=iris$Species)

Jonny Phelps
  • 2,687
  • 1
  • 11
  • 20