0

I saw this post comparing the performance of different name lookup operation in R http://broadcast.oreilly.com/2010/03/lookup-performance-in-r.html

Here is the result for name lookup performance using single bracket '[' on a vector. 1st row = vector length, 2dn row = time.

arrays, first element, by label, single bracket:

 1024  2048  4096  8192 16384 32768 
0.268 0.282 0.588 1.439 2.728 5.397 

arrays, last element, by label, single bracket:

 1024  2048  4096  8192 16384 32768     
0.173 0.278 0.582 1.517 2.713 5.266 

Does any body know why lookup time for the first and last element are the same for single bracket on named vector? Why lookup time for the first element is linear? In the post,

"When you use single-bracket notation, R tried to match all elements with a given label, including fuzzy matches. That means that R scans all the element in the array when you use single bracket notation."

However, this does not make sense as from my experiment if there is duplicated names in a vector, '[' with one label returns only the first one, not all. If the name cannot be found exactly, it returns NA.

> x = c('a'=1, 'b'=2, 'c' = 3, 'b'=4)
> x['b']
b 
2 
> x2 = c('ba' = 1, 'a'=2)
> x2['b']
<NA> 
  NA 

```

Frank
  • 66,179
  • 8
  • 96
  • 180
Eric
  • 253
  • 1
  • 3
  • 7
  • 1
    Are you using R from when that blog post was written? Things may have changed. I would just trust the documentation `?\`[\`` and the results of my own investigations, myself. If the sort of thing that post was talking about matter in your application, maybe avoid indexing by name, and if that's not enough, use Rcpp. – Frank Nov 11 '15 at 16:29
  • I did the same experiment for first label and get the same linear performance in the latest R. I just wonder how '[`' is implemented in R. This is not something in my application, but rather my curious to understand more about R. The documentation on '[' does not give such detail. If I do have a big table to lookup, I would rather use environment with hash. – Eric Nov 11 '15 at 16:46
  • Ok. This might interest you: http://stackoverflow.com/q/19226816/1191259 – Frank Nov 11 '15 at 16:52
  • Ok. This means that I have to download the R base source code and find the c function. I was hopping anyone knowing about the detail could give a description. – Eric Nov 11 '15 at 17:09
  • 1
    I don't think you have to download it; it sounds like Winston Chang has a mirror on github you can browse. Following the instructions at the bottom of the answer, I arrived here: https://github.com/wch/r-source/blob/trunk/src/main/subset.c (look for `do_subset`). It's possible someone more knowledgeable will come by and answer your question, too; it's only been an hour. – Frank Nov 11 '15 at 17:12
  • Thank Frank, the link should be helpful. – Eric Nov 11 '15 at 18:05

0 Answers0