4

I'm new to function writing so hopefully the below makes some sense.

I want to create a function which takes some arguments, which will be used to subset a data.frame. I have searched across the forums and found these Q&As interesting, but haven't been able to answer my question from the discussions:

The function I want to create will take a df, a column name, and a value to match in the rows of the column name. Here's my attempt which I can see to be wrong:

x <- data.frame("col1"=c("email","search","direct"),
            "col2"=c("direct","email","direct"),
            "col3"=c(10,15,27))

fun <- function(df,col,val) {
  result <- subset(df, col==val)
  return(result)
}

I want to pass in the df, x. A column name, let's say "col2". A value, let's say "email". My attempt to do so returns a 0-length df.

fun(x,"col2","email")

Clearly I'm doing something wrong... can anyone help?

Community
  • 1
  • 1
Jonathan Mulligan
  • 352
  • 2
  • 3
  • 10
  • you should have a read at [this post](http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset) to learn a bit more about issues with using `subset` inside a function. – Arun Jun 10 '13 at 11:32
  • I notice you do not use a lot of spaces in your code, e.g. `function(df,col,etc)` -> `function(df, col, etc)` or `col==val` -> `col == val`. Adding spaces makes your code easier to read, less intimidating. – Paul Hiemstra Jun 10 '13 at 11:34

1 Answers1

3

You would want to do somehting like :

df[df[[col_name]] == value,]

the function then becomes:

fun <- function(df, col_name, value) {
  df[df[[col_name]] == value,]
}
fun(x, 'col2', 'email')
    col1  col2 col3
2 search email   15

and if you want to take into account NA values in the logical vector:

fun <- function(df, col_name, value) {
  logical_vector = df[[col_name]] == value
  logical_vector[is.na(logical_vector)] = FALSE
  df[logical_vector, drop = FALSE]
}

Why your example not works is because subset does not look inside the value of col. In stead, it will look for a column called col. I suspect the val parameter is also not correctly parsed. This is one of the reasons not to use subset in non-interactive mode, i.e. in anything else than an interactive R console.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149