2

I have a data frame called "Region_Data" which I have created by performing some functions on it.

I want to take this data frame called "Region_Data" and use it an input and I want to subset it using the following function that I created. The function should produce the subset data frame:

Region_Analysis_Function <- function(Input_Region){
      Subset_Region_Data = subset(Region_Data, Region == "Input_Region" )
      Subset_Region_Data
    }

However, when I create this function and then execute it using:

Region_Analysis_Fuction("North West") 

I get 0 observations when I execute this code (though I know that there are xx number of observations in the data frame.)

I read that there is something called global / local environment, but I'm not really clear on that.

How do I solve this issue? Thank you so much in advance!!

user4918087
  • 421
  • 1
  • 6
  • 14
  • 3
    Try using `Region == Input_Region` (no quotes). Or better yet, you may want to use `Region %in% Input_Region` in case a non-atomic is passed to the function. – nrussell May 20 '15 at 16:29
  • Also read [this](http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset) on the use of `subset` – user227710 May 20 '15 at 16:51

1 Answers1

3

When you try to subset your data using subset(Region_Data, Region == "Input_Region" ), "Input_Region" is being interpreted as a string literal, rather than being evaluated to the value it represents. This means that unless the column Input_Region in your object Region_Data contains some rows with the value "Input_Region", your function will return a zero-row subset. Removing the quotes will solve this, and changing == to %in% will make your function more generalized. Consider the following data set,

mydf <- data.frame(
  x = 1:5,
  y = rnorm(5),
  z = letters[1:5])
##
R> mydf
  x          y z
1 1 -0.4015449 a
2 2  0.4875468 b
3 3  0.9375762 c
4 4 -0.7464501 d
5 5  0.8802209 e

and the following 3 functions,

qfoo <- function(Z) {
  subset(mydf, z == "Z")
}
foo <- function(Z) {
  subset(mydf, z == Z)
}
##
bar <- function(Z) {
  subset(mydf, z %in% Z)
}

where qfoo represents the approach used in your question, foo implements the first change I noted, and bar implements both changes.

The second two functions will work when the input value is a scalar,

R> qfoo("c")
[1] x y z
<0 rows> (or 0-length row.names)
##
R> foo("c")
  x         y z
3 3 0.9375762 c
##
R> bar("c")
  x         y z
3 3 0.9375762 c

but only the third will work if it is a vector:

R> foo(c("a","c"))
  x          y z
1 1 -0.4015449 a
Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(z, Z) :
  longer object length is not a multiple of shorter object length
##
R> bar(c("a","c"))
  x          y z
1 1 -0.4015449 a
3 3  0.9375762 c
nrussell
  • 18,382
  • 4
  • 47
  • 60