17

I have a dataframe:

df <- data.frame('a'=c(1,2,3,4,5), 'b'=c(1,20,3,4,50))
df
    a    b
1   1    1
2   2   20
3   3    3
4   4    4
5   5   50

and I want to create a new column based on existing columns. Something like this:

if (df[['a']] == df[['b']]) {
  df[['c']] <- df[['a']] + df[['b']]
} else {
  df[['c']] <- df[['b']] - df[['a']]
}

The problem is that the if condition is checked only for the first row... If I create a function from the above if statement then I use apply() (or mapply()...), it is the same.

In Python/pandas I can use this:

df['c'] = df[['a', 'b']].apply(lambda x: x['a'] + x['b'] if (x['a'] == x['b']) \
    else x['b'] - x['a'], axis=1)

I want something similar in R. So the result should look like this:

    a    b    c
1   1    1    2
2   2   20   18
3   3    3    6
4   4    4    8
5   5   50   45
zx8754
  • 52,746
  • 12
  • 114
  • 209
ragesz
  • 9,009
  • 20
  • 71
  • 88
  • The problem is that when using == as logical operator that indeed only one, respectively the first entry is selected. The vectorized answer by @akrun should do the job. – JSN Aug 26 '16 at 11:36
  • 3
    Technically, you could also use somthing like `with(df, (a * c(-1L, 1L)[(a == b) +1L]) + b)` but it's not very intuitive – talat Aug 26 '16 at 11:45

5 Answers5

33

One option is ifelse which is vectorized version of if/else. If we are doing this for each row, the if/else as showed in the OP's pandas post can be done in either a for loop or lapply/sapply, but that would be inefficient in R.

df <- transform(df, c= ifelse(a==b, a+b, b-a))
df
#  a  b  c
#1 1  1  2
#2 2 20 18
#3 3  3  6
#4 4  4  8
#5 5 50 45

This can be otherwise written as

df$c <- with(df, ifelse(a==b, a+b, b-a))

to create the 'c' column in the original dataset


As the OP wants a similar option in R using if/else

df$c <- apply(df, 1, FUN = function(x) if(x[1]==x[2]) x[1]+x[2] else x[2]-x[1])
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you! Could you please give an `apply()` (or `sapply()`, `mapply()`, `tapply()`, `lapply()`) version as well if it is possible (or a link with basic examples)? I want to understand their mechanism with this simple example (I have to `apply` more complex functions and conditions). Thanks a lot!! – ragesz Aug 26 '16 at 11:45
  • 1
    @ragesz If you want to understand where to use these functions, [this](http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega) could help you. – akrun Aug 26 '16 at 11:47
  • 3
    @ragesz When vectorized solutions are available, using slow `apply()` loops is a bad idea. One should not seek to use [a specific type of command to solve a problem](http://meta.stackexchange.com/a/66378). Instead it is important to learn which methods are suitable in which cases. The vectorized solutions in this answer show the correct way to solve the problem in R. – RHertel Aug 26 '16 at 12:02
10

Here is a slightly more confusing algebraic method:

df$c <- with(df, b + ((-1)^((a==b)+1) * a))

df
  a  b  c
1 1  1  2
2 2 20 18
3 3  3  6
4 4  4  8
5 5 50 45

The idea is that the "minus" operator is turned on or off based on the test a==b.

lmo
  • 37,904
  • 9
  • 56
  • 69
  • It's very nice, thank you! Actually the point of my question was on "creating new column based on existing ones", and I just created a simple basic example to demonstrate this problem. But you solution is very intuitive, I can understand R a bit more (how R converts boolean to integer automatically etc. – ragesz Aug 26 '16 at 11:56
8

If you want an apply method, then another way with mapply would be create a function and apply it,

fun1 <- function(x, y) if (x == y) {x + y} else {y-x}
df$c <- mapply(fun1, df$a, df$b)
df
#  a  b  c
#1 1  1  2
#2 2 20 18
#3 3  3  6
#4 4  4  8
#5 5 50 45
Sotos
  • 51,121
  • 6
  • 32
  • 66
7

Using dplyr package:

library(dplyr)

df <- df %>% 
  mutate(c = if_else(a == b, a + b, b - a))

df
#   a  b  c
# 1 1  1  2
# 2 2 20 18
# 3 3  3  6
# 4 4  4  8
# 5 5 50 45
zx8754
  • 52,746
  • 12
  • 114
  • 209
4

A solution with apply

myFunction <- function(x){
  a <- x[1]
  b <- x[2]
  #further values ignored (if there are more than 2 columns)
  value <- if(a==b) a + b else b - a
  #or more complicated stuff
  return(value)
}

df$c <- apply(df, 1, myFunction)
Phann
  • 1,283
  • 16
  • 25