71

I'm still learning how to translate a SAS code into R and I get warnings. I need to understand where I'm making mistakes. What I want to do is create a variable which summarizes and differentiates 3 status of a population: mainland, overseas, foreigner. I have a database with 2 variables:

  • id nationality: idnat (french, foreigner),

If idnat is french then:

  • id birthplace: idbp (mainland, colony, overseas)

I want to summarize the info from idnat and idbp into a new variable called idnat2:

  • status: k (mainland, overseas, foreigner)

All these variables use "character type".

Results expected in column idnat2 :

   idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign

Here is my SAS code I want to translate in R:

if idnat = "french" then do;
   if idbp in ("overseas","colony") then idnat2 = "overseas";
   else idnat2 = "mainland";
end;
else idnat2 = "foreigner";
run;

Here is my attempt in R:

if(idnat=="french"){
    idnat2 <- "mainland"
} else if(idbp=="overseas"|idbp=="colony"){
    idnat2 <- "overseas"
} else {
    idnat2 <- "foreigner"
}

I receive this warning:

Warning message:
In if (idnat=="french") { :
  the condition has length > 1 and only the first element will be used

I was advised to use a "nested ifelse" instead for its easiness but get more warnings:

idnat2 <- ifelse (idnat=="french", "mainland",
        ifelse (idbp=="overseas"|idbp=="colony", "overseas")
      )
            else (idnat2 <- "foreigner")

According to the Warning message, the length is greater than 1 so only what's between the first brackets will be taken into account. Sorry but I don't understand what this length has to do with here? Anybody know where I'm wrong?

Saranjith
  • 11,242
  • 5
  • 69
  • 122
balour
  • 741
  • 1
  • 6
  • 8
  • 7
    You shouldn't mix `ifelse` and `else`. – Roland Aug 02 '13 at 08:43
  • 1
    @ Roland You're right thanks for the advice, I've just put the results. What I want is just in the column idnat2 if it makes it clear. @KarlForner thank you that's exactly what I'm trying to do with simple examples however I'm really struggling with "R". I've tried to do the same on SPSS and it was more simple. – balour Aug 02 '13 at 09:48
  • My point is that SO is not a replacement for learning a language. There are plenty of books, tutorials... You should post here when you are stuck, and you have used all other resources. Best. – Karl Forner Aug 02 '13 at 13:31
  • 7
    @KarlForner I agree with you completely. However, in this specific case (`if` vs. `ifelse`) I upvoted the question as I had exactly same problems when I started using R. It was not clear from [Introduction to R](http://cran.r-project.org/doc/manuals/r-release/R-intro.html), nothing about `ifelse` in [R Language Definition](http://cran.r-project.org/doc/manuals/r-release/R-lang.html), there are couple of examples in R For Dummies. Any other sources describing differences between `if` and `ifelse`? – Tomas Greif Aug 02 '13 at 18:01

10 Answers10

141

If you are using any spreadsheet application there is a basic function if() with syntax:

if(<condition>, <yes>, <no>)

Syntax is exactly the same for ifelse() in R:

ifelse(<condition>, <yes>, <no>)

The only difference to if() in spreadsheet application is that R ifelse() is vectorized (takes vectors as input and return vector on output). Consider the following comparison of formulas in spreadsheet application and in R for an example where we would like to compare if a > b and return 1 if yes and 0 if not.

In spreadsheet:

  A  B C
1 3  1 =if(A1 > B1, 1, 0)
2 2  2 =if(A2 > B2, 1, 0)
3 1  3 =if(A3 > B3, 1, 0)

In R:

> a <- 3:1; b <- 1:3
> ifelse(a > b, 1, 0)
[1] 1 0 0

ifelse() can be nested in many ways:

ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>))

ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>)

ifelse(<condition>, 
       ifelse(<condition>, <yes>, <no>), 
       ifelse(<condition>, <yes>, <no>)
      )

ifelse(<condition>, <yes>, 
       ifelse(<condition>, <yes>, 
              ifelse(<condition>, <yes>, <no>)
             )
       )

To calculate column idnat2 you can:

df <- read.table(header=TRUE, text="
idnat idbp idnat2
french mainland mainland
french colony overseas
french overseas overseas
foreign foreign foreign"
)

with(df, 
     ifelse(idnat=="french",
       ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign")
     )

R Documentation

What is the condition has length > 1 and only the first element will be used? Let's see:

> # What is first condition really testing?
> with(df, idnat=="french")
[1]  TRUE  TRUE  TRUE FALSE
> # This is result of vectorized function - equality of all elements in idnat and 
> # string "french" is tested.
> # Vector of logical values is returned (has the same length as idnat)
> df$idnat2 <- with(df,
+   if(idnat=="french"){
+   idnat2 <- "xxx"
+   }
+   )
Warning message:
In if (idnat == "french") { :
  the condition has length > 1 and only the first element will be used
> # Note that the first element of comparison is TRUE and that's whay we get:
> df
    idnat     idbp idnat2
1  french mainland    xxx
2  french   colony    xxx
3  french overseas    xxx
4 foreign  foreign    xxx
> # There is really logic in it, you have to get used to it

Can I still use if()? Yes, you can, but the syntax is not so cool :)

test <- function(x) {
  if(x=="french") {
    "french"
  } else{
    "not really french"
  }
}

apply(array(df[["idnat"]]),MARGIN=1, FUN=test)

If you are familiar with SQL, you can also use CASE statement in sqldf package.

Community
  • 1
  • 1
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
13

Try something like the following:

# some sample data
idnat <- sample(c("french","foreigner"),100,TRUE)
idbp <- rep(NA,100)
idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE)

# recoding
out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland",
              ifelse(idbp %in% c("overseas","colony"),"overseas",
                     "foreigner"))
cbind(idnat,idbp,out) # check result

Your confusion comes from how SAS and R handle if-else constructions. In R, if and else are not vectorized, meaning they check whether a single condition is true (i.e., if("french"=="french") works) and cannot handle multiple logicals (i.e., if(c("french","foreigner")=="french") doesn't work) and R gives you the warning you're receiving.

By contrast, ifelse is vectorized, so it can take your vectors (aka input variables) and test the logical condition on each of their elements, like you're used to in SAS. An alternative way to wrap your head around this would be to build a loop using if and else statements (as you've started to do here) but the vectorized ifelse approach will be more efficient and involve generally less code.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Hello, alright IF and ELSE in R are not vectorised so that's why I got the warning about length > 1 and only the 1st TRUE argument recorded. I'm going to try your hint about IFELSE it seems like it's more efficient though Tomas greif's one also. – balour Aug 02 '13 at 13:19
9

If the data set contains many rows it might be more efficient to join with a lookup table using data.table instead of nested ifelse().

Provided the lookup table below

lookup
     idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign

and a sample data set

library(data.table)
n_row <- 10L
set.seed(1L)
DT <- data.table(idnat = "french",
                 idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE))
DT[idbp == "foreign", idnat := "foreign"][]
      idnat     idbp
 1:  french   colony
 2:  french   colony
 3:  french overseas
 4: foreign  foreign
 5:  french mainland
 6: foreign  foreign
 7: foreign  foreign
 8:  french overseas
 9:  french overseas
10:  french mainland

then we can do an update while joining:

DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]
      idnat     idbp   idnat2
 1:  french   colony overseas
 2:  french   colony overseas
 3:  french overseas overseas
 4: foreign  foreign  foreign
 5:  french mainland mainland
 6: foreign  foreign  foreign
 7: foreign  foreign  foreign
 8:  french overseas overseas
 9:  french overseas overseas
10:  french mainland mainland
Uwe
  • 41,420
  • 11
  • 90
  • 134
8

You can create the vector idnat2 without if and ifelse.

The function replace can be used to replace all occurrences of "colony" with "overseas":

idnat2 <- replace(idbp, idbp == "colony", "overseas")
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • 1
    more or less the same: `df$idnat2 <- df$idbp; df$idnat2[df$idnat == 'colony'] <- 'overseas'` – Jaap Oct 06 '17 at 13:39
6

Using the SQL CASE statement with the dplyr and sqldf packages:

Data

df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", 
"french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 
2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", 
"idbp"), class = "data.frame", row.names = c(NA, -4L))

sqldf

library(sqldf)
sqldf("SELECT idnat, idbp,
        CASE 
          WHEN idbp IN ('colony', 'overseas') THEN 'overseas' 
          ELSE idbp 
        END AS idnat2
       FROM df")

dplyr

library(dplyr)
df %>% 
mutate(idnat2 = case_when(idbp == 'mainland' ~ "mainland", 
                          idbp %in% c("colony", "overseas") ~ "overseas", 
                         TRUE ~ "foreign"))

Output

    idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign
mpalanco
  • 12,960
  • 2
  • 59
  • 67
2

With data.table, the solutions is:

DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", 
        ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))]

The ifelse is vectorized. The if-else is not. Here, DT is:

    idnat     idbp
1  french mainland
2  french   colony
3  french overseas
4 foreign  foreign

This gives:

   idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign
Sun Bee
  • 1,595
  • 15
  • 22
  • imo a better way would be: `DT[, idnat2 := idbp][idbp %in% c('colony','overseas'), idnat2 := 'overseas']` – Jaap Sep 19 '16 at 13:48
  • 2
    or even better: `DT[, idnat2 := idbp][idbp == 'colony', idnat2 := 'overseas']` – Jaap Sep 19 '16 at 13:49
  • Another `data.table` way would be to join with a lookup table: `DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]` – Uwe Sep 29 '17 at 09:56
1
# Read in the data.

idnat=c("french","french","french","foreign")
idbp=c("mainland","colony","overseas","foreign")

# Initialize the new variable.

idnat2=as.character(vector())

# Logically evaluate "idnat" and "idbp" for each case, assigning the appropriate level to "idnat2".

for(i in 1:length(idnat)) {
  if(idnat[i] == "french" & idbp[i] == "mainland") {
    idnat2[i] = "mainland"
} else if (idnat[i] == "french" & (idbp[i] == "colony" | idbp[i] == "overseas")) {
  idnat2[i] = "overseas"
} else {
  idnat2[i] = "foreign"
} 
}

# Create a data frame with the two old variables and the new variable.

data.frame(idnat,idbp,idnat2) 
Azul
  • 11
  • 1
1

The explanation with the examples was key to helping mine, but the issue that i came was when I copied it didn't work so I had to mess with it in several ways to get it to work right. (I'm super new at R, and had some issues with the third ifelse due to lack of knowledge).

so for those who are super new to R running into issues...

   ifelse(x < -2,"pretty negative", ifelse(x < 1,"close to zero", ifelse(x < 3,"in [1, 3)","large")##all one line
     )#normal tab
)

(i used this in a function so it "ifelse..." was tabbed over one, but the last ")" was completely to the left)

Tiffany T
  • 11
  • 1
  • 1
    Just an FYI--when doing numeric bins `cut` can be nicer. You could rewrite this as `cut(x, breaks = c(-Inf, -2, 1, 3, Inf), labels = c("pretty negative", "close to zero", "in [1, 3)", "large"))`. If it's just one or two nestings `ifelse` is just as good, but if you have to go deeper `cut` can be a nice relief to not need to track all the nesting and parentheses. – Gregor Thomas Feb 12 '20 at 21:42
  • Thank you, I haven't worked with cut, it seems like it breaks things into (-inf,-2],(-2,1],(1,3],(3,inf], so this would work well as long as the intervals state "x <= some Z". I tested, inverting breaks and labels, labels only and breaks only...it but it didn't give me the result interval of [-inf,-2),[-2,1),[1,3),[3,inf) that I needed... But given real world application, cut seems to be better. – Tiffany T Feb 13 '20 at 23:17
  • `cut` also has an argument `right` (defaults to TRUE) indicating the intervals are closed on the right. Setting `right = FALSE` will give you `[-inf,-2),[-2,1),[1,3),[3,inf)`. Doesn't come in to play with the `-Inf` and `Inf` bounds here, but you can also use `include.lowest` to toggle whether both extremes are closed or not. See `?cut` for more details. – Gregor Thomas Feb 14 '20 at 01:20
0

I put together a function for nesting if-else statements. Not optimized for speed. Thought it might be useful for others.

ifelse_nested <- function(...) {
  args <- list(...)
  nargs <- length(args)
  
  default_ind <- nargs
  condition_inds <- which(seq_len(nargs) %% 2 == 1)
  condition_inds <- condition_inds[-length(condition_inds)] # remove default_ind
  value_inds <- which(seq_len(nargs) %% 2 == 0)
  
  .init <- args[[default_ind]]
  .x <- mapply(
    function(icond_ind, ivalue_ind) {
      return(list(condition=args[[icond_ind]], value=args[[ivalue_ind]]))
    }
    , icond_ind=condition_inds
    , ivalue_ind=value_inds
    , SIMPLIFY = FALSE
  ) # generate pairs of conditions & resulting-values
  
  out <- Reduce(
    function(x, y) ifelse(x$condition, x$value, y)
    , x = .x
    , init=.init
    , right=TRUE
  )
  
  return(out)
}

For example:

x <- seq_len(10)
ifelse_nested(x%%2==0, 2,x%%3==0, x^2, 0)
-1

Sorry for joining too late to the party. Here's an easy solution.

#building up your initial table
idnat <- c(1,1,1,2) #1 is french, 2 is foreign

idbp <- c(1,2,3,4) #1 is mainland, 2 is colony, 3 is overseas, 4 is foreign

t <- cbind(idnat, idbp)

#the last column will be a vector of row length = row length of your matrix
idnat2 <- vector()

#.. and we will populate that vector with a cursor

for(i in 1:length(idnat))

     #*check that we selected the cursor to for the length of one of the vectors*

{  

  if (t[i,1] == 2) #*this says: if idnat = foreign, then it's foreign*

    {

      idnat2[i] <- 3 #3 is foreign

    }

  else if (t[i,2] == 1) #*this says: if not foreign and idbp = mainland then it's mainland*

    {

      idnat2[i] <- 2 # 2 is mainland  

    }

  else #*this says: anything else will be classified as colony or overseas*

    {

      idnat2[i] <- 1 # 1 is colony or overseas 

    }

}


cbind(t,idnat2)
Valentin_Ștefan
  • 6,130
  • 2
  • 45
  • 68
Jorge Lopez
  • 467
  • 4
  • 10
  • 1
    Straightforward, yes. But also verbose and non-idiomatic... And not illustrated very well (why use these integers instead of the data provided in the question?) And duplicative of Azul's answer, which uses basically the same approach but on the text data from the question rather than integers... – Gregor Thomas Jan 10 '19 at 14:05
  • Porque se me ronco hacerlo de esa manera, Gregor. See that? In how many beautiful ways we can communicate... Azul's... Jorge's... Gregor's... – Jorge Lopez Feb 03 '19 at 03:58
  • It's up to OP to choose what seems more logical to him... as it is to you... as it is to me. Saludos Gregor. – Jorge Lopez Feb 03 '19 at 03:59