12

I want a function whose input is a vector of 1s, 2s, and 3s which sends 1 to .2, 2 to .4 and 3 to .5. (The output should be a vector of equal length.) How do I accomplish this?

For example, if

myVector<-c(1,2,3,2,3,3,1)

Then the function

mapVector(myVector)

should return a vector like (.2,.4,.5,.4,.5,.5,.2)

indira
  • 11
  • 1
  • 7
Ben
  • 20,038
  • 30
  • 112
  • 189

2 Answers2

32

A couple of options, all using:

myVector<-c(1,2,3,2,3,3,1)

Factor

newvals <- c(.2,.4,.5)
newvals[as.factor(myVector)]
#[1] 0.2 0.4 0.5 0.4 0.5 0.5 0.2

Named vector

newvals <- c(`1`=.2,`2`=.4,`3`=.5)
newvals
#  1   2   3 
#0.2 0.4 0.5 

newvals[as.character(myVector)]
#  1   2   3   2   3   3   1 
#0.2 0.4 0.5 0.4 0.5 0.5 0.2 

Lookup table

mapdf <- data.frame(old=c(1,2,3),new=c(.2,.4,.5))
mapdf$new[match(myVector,mapdf$old)]
#[1] 0.2 0.4 0.5 0.4 0.5 0.5 0.2

Benchmarks to quantify @Joe 's comment below and address @Ananda's comment as well.

myVector <- c(1,2,3,2,3,3,1)
# setup for the benchmarking
test <- sample(myVector,1e6,replace=TRUE)
newvals <- c(.2,.4,.5)
newvalsvec <- c(`1`=.2,`2`=.4,`3`=.5)
mapdf <- data.frame(old=c(1,2,3),new=c(.2,.4,.5))

microbenchmark(
  newvals[as.factor(test)],
  newvalsvec[as.character(test)],
  mapdf$new[match(test,mapdf$old)],
  newvals[test],
  times=10L
)

#Unit: milliseconds
#         expr        min         lq     median         uq        max
#factor        1863.40146 1876.04197 1890.99147 1913.13046 2014.23609
#namedvector   1809.26883 1812.76272 1837.18852 1851.42954 1858.44996
#lookup          38.48697   38.83405   39.90146   69.65140   71.75051
#newvals[test]   34.07380   34.55885   50.61287   65.69495   66.08699
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    I doubt this is an issue in this case, but it's worth noting that the lookup table is by far the most efficient of these solutions. – Joe Aug 27 '13 at 05:40
  • @Joe - added some benchmarks - not a really huge difference between the named vector and lookup table methods. The factor method is definitely slower though. – thelatemail Aug 27 '13 at 06:03
  • 3
    Thela, your benchmarks are pretty sloppily done and not reproducible! Also, why the conversion to `as.factor` and `as.character` in your answers? My approach if the data really are this simple would be to just do `c(.2, .4, .5)[test]`, which should be very fast. – A5C1D2H2I1M1N2O1R2T1 Aug 27 '13 at 09:08
  • @AnandaMahto - that is true for the simplest examples - although the basic indexing answer falls over if the original `myVector` data is say `2:4` - then the indexing is direct and not an ordered matching from lowest-to-highest. The same with the `as.character` - the matching is done to the names of the `newvals` vector not the vector values themselves. I'll re-do the benchmarks - they were a rush job and it shows. – thelatemail Aug 27 '13 at 23:06
  • Just ran this on my computer and I'm now seeing the index way newvals[test] as twice as fast as lookup. I'd still do the lookup, though. – Frank Sep 11 '16 at 23:28
  • 1
    @Frank - I get the same result now too - 28ms vs 14ms over here - the absolute difference is tiny though considering this is chewing a million records. – thelatemail Sep 12 '16 at 00:46
1
install.packages("hash")
library(hash)
h<-hash(1:3, c(.2,.4,.5))
myVector<-c(1,2,3,2,3,3,1)
sapply(myVector,function(x){return(h[[as.character(x)]])})
brown10
  • 21
  • 2