Populate a column using if statements in r

Question

I have quite a simple question which I am currently struggling with. If I have an example dataframe:

a <- c(1:5)  
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)

How do I create a new column ('c') which is then populated using if statements on column b. For example: 'cat' for those values in b which are 1 or 2 'dog' for those values in b which are between 3 and 5 'rabbit' for those values in b which are greater than 6

So column 'c' using dataframe df1 would read: cat, dog, dog, rabbit, rabbit.

Many thanks in advance.

This is asked quite often, look here for example: http://stackoverflow.com/questions/12379128/r-switch-statement-on-comparisons/12379251#12379251. — flodel, Dec 02 '12 at 19:25
@flodel: +1 for using findInterval in your answer. As you say this has been asked an answered many times. — IRTFM, Dec 02 '12 at 21:56

IRTFM · Answer 1 · 2017-02-06T20:51:37.917

dfrm$dc <- c("dog", "cat", "rabbit")[ findInterval(dfrm$b, c(1, 2.5, 5.5, Inf)) ]

The findInterval approach will be much faster than nested ifelse strategies, and I'm guessing very much faster than a function that loops over unnested if statements. Those of us working with bigger data do notice the differences when we pick inefficient algorithms.

This didn't actually address the request, but I don't always think that new users of R will know the most expressive or efficient approach to problems. A request to "use IF" sounded like an effort to translate coding approaches typical of the two major macro statistical processors SPSS and SAS. The R if control structure is not generally an efficient approach to recoding a column since the argument to its first position will only get evaluated for the first element. On its own it doesn't process a column, whereas the ifelse function will do so. The cut function might have been used here (with appropriate breaks and labels parameters) , although it would have delivered a factor-value instead of a character value. The findInterval approach was chosen for its ability to return multiple levels (which a single ifelse cannot). I think chaining or nesting ifelse's becomes quickly ugly and confusing after about 2 or 3 levels of nesting.

Anyone from the future stumbling across this answer, there's a missing close parenth before the close square bracket. — Daniel, Jan 22 '16 at 13:55
@Daniel: Thank you for pointing out that error. It waited here for over a year beyond your notation until someone corrected it. They inappropriately left an explanation in the text of of the answer body which I replaced with the second paragraph. — IRTFM, Feb 06 '17 at 20:58

score 2 · Answer 2 · edited Dec 02 '12 at 19:30

2

df1 <- 
    transform(
        df1 ,
        c =
            ifelse( b %in% 1:2 , 'cat' ,
            ifelse( b %in% 3:5 , 'dog' , 'rabbit' ) ) )

edited Dec 02 '12 at 19:30

Brandon Bertelsen

43,807
34
160
255

answered Dec 02 '12 at 19:27

Anthony Damico

5,779
7
46
77

score 2 · Answer 3 · answered Dec 02 '12 at 19:32

Although ifelse() is useful, sometimes it doesn't provide what one would intuitively expect. So, I like to write it out.

a <- c(1:5)  
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)

species <- function(x) { 
if(x == 1 | x == 2) y <- "cat"
if(x > 2 & x < 6) y <- "dog"
if(x > 6) y <- "rabbit"
return(y)
}

df1$c <- sapply(df1$b,species)

Populate a column using if statements in r

3 Answers3

Linked