1

So I have this riddle for all R aficionados out there:

library(data.table)
set.seed(666)
res<-data.table(NULL)
for(i in 1:10){
  res<-rbind(res,data.table(a=i,b=paste0(letters[sample(1:i)],collapse = "")))
}
res<-res[sample(10)]

resulting in:

>res
       a          b
   1:  1          a
   2:  9  dhgcbeifa
   3:  3        cba
   4:  7    gcafdeb
   5:  6     eacdfb
   6:  8   dacbfehg
   7: 10 fehjaigcbd
   8:  4       dacb
   9:  5      daecb
  10:  2         ba

But case A

 >t(apply(res,1,nchar))
      a  b
 [1,] 2  1
 [2,] 2  9
 [3,] 2  3
 [4,] 2  7
 [5,] 2  6
 [6,] 2  8
 [7,] 2 10
 [8,] 2  4
 [9,] 2  5
[10,] 2  2

However case B

  >res[,lapply(.SD, nchar)]

     a  b
  1: 1  1
  2: 1  9
  3: 1  3
  4: 1  7
  5: 1  6
  6: 1  8
  7: 2 10
  8: 1  4
  9: 1  5
 10: 1  2

My question is why the 2 in column a in case A is wrong?

amonk
  • 1,769
  • 2
  • 18
  • 27
  • 1
    `as.matrix(res)` – Frank Feb 12 '18 at 16:39
  • @Frank sure! but still a mystery, right? – amonk Feb 12 '18 at 16:41
  • You have so many bad practices 101 in R in this question that I really hope you made this MRE for demonstration purposes only. – David Arenburg Feb 15 '18 at 12:08
  • @DavidArenburg Do enlighten us,...Sir! – amonk Feb 15 '18 at 12:20
  • Growing objects in a loop, using `apply` with a data.table, using vectorized functions such as `sample`, `paste0` in a loop, running `data.table` per row (!), etc. Each time I look at it I see something new. If you are using such practices in your real code I won't bet surprised if it runs for days. – David Arenburg Feb 15 '18 at 12:27

2 Answers2

3

When you coerce res to matrix (the first operation when using apply) you get:

as.matrix(res)
#-------------------
      a    b           
 [1,] " 7" "eafdgcb"   
 [2,] " 2" "ab"        
 [3,] " 8" "efcbdhga"  
 [4,] " 1" "a"         
 [5,] "10" "hdeifajgbc"
 [6,] " 4" "dbac"      
 [7,] " 5" "daecb"     
 [8,] " 6" "eadbfc"    
 [9,] " 9" "chfdbiaeg" 
[10,] " 3" "acb" 
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Great! So the next question has to be asked... why are both columns parsed `as.character` and not `as.some.numerical` (int/num)? – amonk Feb 13 '18 at 09:02
  • 1
    Matrices in R can only be of one type and there is a hierarchy of types, so `character` is "lower on the hierarchy than `numeric` and is chosen as the "lowest common denominator" when both of these types are present in a dataframe or other source of data. Somewhat explained in Details section of `?as.matrix`. – IRTFM Feb 13 '18 at 16:56
3

This is a conversion problem from res$a to as.matrix. Character strings are padded with blanks to the display width of the widest.

You can find a well detailed explanation of this behaviour here.

Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • 1
    Good find. Many R users are not aware that `apply()` implies coersion to `matrix`. – Uwe Feb 12 '18 at 19:02