2

I would like to know how I can populate a column in a data.frame or data.table based on the value of other columns.

for example:

data.table(a = c(1:5), b = c(5:1), c = rep("",5))
 a b c
 1 5  
 2 4  
 3 3  
 4 2  
 5 1  

I want to populate c to:

  • indicate "More" if "a" is greater than "b"
  • "Less" if "a" is less than "b"
  • "Equal if they are equal

as below:

a b c
1 5 "Less" 
2 4 "Less" 
3 3 "Equal"
4 2 "More"
5 1 "More"

I know that this can be done through a for loop with multiple if statements, but I have a very large data set and I would like to do this using the "Apply" family of functions.

Any help would be greatly appreciated.

Ayelavan
  • 933
  • 1
  • 8
  • 10
  • 2
    Possible duplicate of [Nested ifelse statement in R](http://stackoverflow.com/questions/18012222/nested-ifelse-statement-in-r) – shrgm May 16 '16 at 05:35
  • Using `apply` would again invite for nested `ifelse`. Not sure if there is another way to use `apply` . `apply(df, 1, function(x) ifelse(x[1] > x[2], "More", ifelse(x[1] < x[2], "Less", "Equal")))` – Ronak Shah May 16 '16 at 05:37
  • The `apply` option mentioned above is not needed as it is a data.table. Also, looping through rows and using `ifelse` is highly inefficient. I already posted some efficient methods on my solution. – akrun May 16 '16 at 05:40
  • exactly. Just wanted to clarify why OP insisted on using `apply` – Ronak Shah May 16 '16 at 05:44

2 Answers2

4

Here's one that seems bizarrely efficient by my testing, and is not overly complex:

dt1[, c := c("less","equal","more")[max.col(setDT(.(a < b, a==b, a > b)))] ]

Another option using interaction which is about as good as I can do in simplification of code:

dt1[, c := c("equal","less","more")[interaction(a < b, a > b)] ]

#   a b     c
#1: 1 5  less
#2: 2 4  less
#3: 3 3 equal
#4: 4 2  more
#5: 5 1  more

This works because interaction will return four possibilities for two comparisons:

  1. FALSE.FALSE = neither TRUE, which in this case means not larger or smaller, therefore equal

  2. TRUE.FALSE = first TRUE, therefore less

  3. FALSE.TRUE = second TRUE, therefore more
  4. TRUE.TRUE = both TRUE, theoretically impossible except for fringe cases below the error tolerance.
thelatemail
  • 91,185
  • 12
  • 128
  • 188
3

We can use a numeric method to create unique values based on the logical vectors (a > b and a==b), convert to factor, get the numeric storage values by converting to numeric, replace that with the elements in the vector (c("Less", "More", "Equal")) and assign (:=) it as 'c'.

dt1[, c :=c("Less", "More", "Equal")[as.numeric(factor(1+2*(a>b) + 4*(a==b)))]]
dt1
#   a b     c
#1: 1 5  Less
#2: 2 4  Less
#3: 3 3 Equal
#4: 4 2  More
#5: 5 1  More

Or we can use a nested ifelse to get the expected output.

dt1[, c:= ifelse(a>b, "More", ifelse(a<b, "Less", "Equal"))]

Or another option would be using the Reduce/max.col/pmax to get a numeric index and replace it with the character vector as showed in the first method.

dt1[,  c:= c("Equal", "Less", "More")[pmax(max.col(.SD),
                         3*Reduce(`==`, .SD))], .SDcols = a:b]

data

dt1 <-  data.table(a = c(1:5), b = c(5:1), c = rep("",5))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Both methods worked like a charm in the given example. Though I think I prefer the second method to apply on my specific case as I find the first method a bit hard to understand. – Ayelavan May 16 '16 at 05:52
  • 1
    can the first option be simplified to `dt1[, c := c("less","more","equal")[ (1 + (a >= b) + (a == b)) ] ]`? – SymbolixAU May 16 '16 at 05:56
  • @Symbolix Yes, It can be done, I thought by multiplying, we get the unique values. – akrun May 16 '16 at 06:25