5

I have quite a simple question which I am currently struggling with. If I have an example dataframe:

a <- c(1:5)  
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)

How do I create a new column ('c') which is then populated using if statements on column b. For example: 'cat' for those values in b which are 1 or 2 'dog' for those values in b which are between 3 and 5 'rabbit' for those values in b which are greater than 6

So column 'c' using dataframe df1 would read: cat, dog, dog, rabbit, rabbit.

Many thanks in advance.

KT_1
  • 8,194
  • 15
  • 56
  • 68
  • This is asked quite often, look here for example: http://stackoverflow.com/questions/12379128/r-switch-statement-on-comparisons/12379251#12379251. – flodel Dec 02 '12 at 19:25
  • @flodel: +1 for using findInterval in your answer. As you say this has been asked an answered many times. – IRTFM Dec 02 '12 at 21:56

3 Answers3

6
dfrm$dc <- c("dog", "cat", "rabbit")[ findInterval(dfrm$b, c(1, 2.5, 5.5, Inf)) ]

The findInterval approach will be much faster than nested ifelse strategies, and I'm guessing very much faster than a function that loops over unnested if statements. Those of us working with bigger data do notice the differences when we pick inefficient algorithms.

This didn't actually address the request, but I don't always think that new users of R will know the most expressive or efficient approach to problems. A request to "use IF" sounded like an effort to translate coding approaches typical of the two major macro statistical processors SPSS and SAS. The R if control structure is not generally an efficient approach to recoding a column since the argument to its first position will only get evaluated for the first element. On its own it doesn't process a column, whereas the ifelse function will do so. The cut function might have been used here (with appropriate breaks and labels parameters) , although it would have delivered a factor-value instead of a character value. The findInterval approach was chosen for its ability to return multiple levels (which a single ifelse cannot). I think chaining or nesting ifelse's becomes quickly ugly and confusing after about 2 or 3 levels of nesting.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 3
    Anyone from the future stumbling across this answer, there's a missing close parenth before the close square bracket. – Daniel Jan 22 '16 at 13:55
  • @Daniel: Thank you for pointing out that error. It waited here for over a year beyond your notation until someone corrected it. They inappropriately left an explanation in the text of of the answer body which I replaced with the second paragraph. – IRTFM Feb 06 '17 at 20:58
2
df1 <- 
    transform(
        df1 ,
        c =
            ifelse( b %in% 1:2 , 'cat' ,
            ifelse( b %in% 3:5 , 'dog' , 'rabbit' ) ) )
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
2

Although ifelse() is useful, sometimes it doesn't provide what one would intuitively expect. So, I like to write it out.

a <- c(1:5)  
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)

species <- function(x) { 
if(x == 1 | x == 2) y <- "cat"
if(x > 2 & x < 6) y <- "dog"
if(x > 6) y <- "rabbit"
return(y)
}

df1$c <- sapply(df1$b,species)
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255