2

In my data.table, I wanted to numerate entries if there are more than one in each by group:

dt1 <- data.table(col1=1:4, col2 = c('A', 'B', 'B', 'C'))
#    col1 col2
# 1:    1    A
# 2:    2    B
# 3:    3    B
# 4:    4    C

dt1[, col3:={
  if (.N>1) {paste0((1:.N), "_", col2)} else {col2};
}, by=col2]

#    col1 col2 col3
# 1:    1    A    A
# 2:    2    B  1_B
# 3:    3    B  2_B
# 4:    4    C    C

This works fine, but didn't work when I tried to use ifelse() instead:

dt1[, col4:=ifelse (.N>1, paste0((1:.N), "_", col2), col2), by=col2]
#    col1 col2 col3 col4
# 1:    1    A    A    A
# 2:    2    B  1_B  1_B
# 3:    3    B  2_B  1_B
# 4:    4    C    C    C

can anyone explain why?

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Vasily A
  • 8,256
  • 10
  • 42
  • 76
  • 1
    `ifelse` returns one value for each test. You wanted more than one. It is only returning `1_B` as it was built to. – Pierre L Feb 10 '16 at 21:11
  • @PierreLafortune, thanks, it seems to be indeed a duplicate of that question. I just expected that "one value" could be one vector value as well. Current behavior looks rather cont-intuituve for me, but OK – Vasily A Feb 10 '16 at 21:15

1 Answers1

4

This is only by proxy related to data.table; at core is that ifelse is designed for use like:

ifelse(test, yes, no)

where test, yes, and no all have the same length -- the output will be the same length as test, and all the elements corresponding to where test is TRUE will be the corresponding element from yes, and similarly for where test is FALSE.

When test is a scalar and yes or no are vectors, as in your case, you have to look at what ifelse is doing to understand what's going on:

Relevant source:

if (any(test[ok])) #is any element of `test` `TRUE`?
        ans[test & ok] <- rep(yes, length.out = length(ans))[test & 
            ok]

What is rep(c(1, 2), length.out = 1)? It's just 1 -- the second element is truncated.

That's what's happened here -- the value of ifelse is only the first element of paste0(1:.N, "_", col2). When passed to `:=`, this single element is recycled.

When your logical condition is a scalar, you should use if, not ifelse. I'll also add that I do my damndest to avoid using ifelse in general because it's slow.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198