2

I have a list of character vectors, and I would like to access the last value of each element.

mylist<-list(A=c("a"),
             B=c("a","b"),
             C=c("a","b","c"),
             D=c("a","b","c","d"))

At first, (by looking at some related threads in Python), I thought I could do something like:

for(i in 1:length(mylist)){
   print(mylist[[i]][-1])
}
# character(0)
# [1] "b"
# [1] "b" "c"
# [1] "b" "c" "d"

I guess this doesn't work. Basically, as a result, I would like

myfunction<-function(mylist){
  output<-as.character()
  for(i in 1:length(mylist)){
  output<-c(output, mylist[[i]][length(mylist[[i]])])}
  return(output)
}

myfunction(mylist)
# [1] "a" "b" "c" "d"

Is there a more efficient way?

wyatt
  • 371
  • 3
  • 13

2 Answers2

4

As Rich Scriven pointed out in the (deleted) comments there are many ways to accomplish this task, one of which is to use sapply and tail with argument n = 1:

sapply(mylist, tail, n = 1)
#  A   B   C   D 
#"a" "b" "c" "d" 

Another, safer and potentially faster variant of the same idea is to use vapply

vapply(mylist, tail, FUN.VALUE = character(1), n = 1)
# or a little shorter
# vapply(mylist, tail, "", 1)

(another) benchmarking

set.seed(1)
mylist <- replicate(1e5, list(sample(letters, size = runif(1, 1, length(letters)))))

benchmark <- microbenchmark(
  f1 = {myfunction(mylist)},
  f2 = {sapply(mylist, function(l) l[length(l)])},
  f3 = {vapply(mylist, function(l) l[length(l)], "")},
  f4 = {sapply(mylist, tail, 1)},
  f5 = {vapply(mylist, tail, "", 1)},
  f6 = {mapply("[", mylist, lengths(mylist))},
  f7 = {mapply("[[", mylist, lengths(mylist))}, # added this out of curiosity
  f8 = {unlist(mylist)[cumsum(lengths(mylist))]},
  times = 100L
)

autoplot(benchmark)

Same result here: Rich's unlist(mylist)[cumsum(lengths(mylist_long))] is the fastest by far. No real difference between sapply and vapply it seems. myfunction() as defined in OP's question.

enter image description here

#benchmark
#Unit: milliseconds
# expr         min          lq        mean     median          uq        max neval
#   f1 28797.26121 30462.16785 31836.26875 31191.7762 32950.92537 36586.5477   100
#   f2   106.34213   117.75074   127.97763   124.9191   134.82047   176.2058   100
#   f3    99.72042   106.87308   119.59811   113.9663   123.63619   465.5335   100
#   f4  1242.11950  1291.38411  1409.35750  1350.3460  1505.76089  1880.6537   100
#   f5  1189.22615  1274.48390  1366.07234  1333.8885  1418.75394  1942.2803   100
#   f6   112.27316   123.73429   132.39888   129.8220   138.33851   191.2509   100
#   f7   107.27392   118.19201   128.06681   123.1317   133.29827   208.8425   100
#   f8    28.03948    28.84125    31.19637    30.3115    32.94077    40.9624   100
markus
  • 25,843
  • 5
  • 39
  • 58
3

Benchmarking the solutions proposed in the comments we find that Rich's proposal using unlist is the fastest.

By inspecting the code and tweaking the parameters we can make it even faster.

The slowness of tail is discussed there: https://stackoverflow.com/a/37238415/2270475

On OP's sample data:

library(microbenchmark)
microbenchmark(
  r2evans = sapply(mylist, function(l) l[length(l)]),
  markus  = sapply(mylist, tail, 1),
  Rich1   = mapply("[", mylist, lengths(mylist)),
  Rich2   = unlist(mylist)[cumsum(lengths(mylist))],
  markus2 = vapply(mylist, tail, character(1), 1),
  mm      = .Internal(unlist(mylist,FALSE,FALSE))[cumsum(lengths(mylist,FALSE))],
  unit = "relative"
)
# Unit: relative
#     expr       min        lq      mean    median        uq         max neval
#  r2evans 16.083333 12.764706 25.545957 12.368421 13.133333 122.1428571   100
#   markus 82.333333 59.294118 50.937673 60.342105 60.644444  10.2253968   100
#    Rich1 19.583333 15.294118 13.368047 15.394737 15.622222   2.7492063   100
#    Rich2  4.166667  3.705882  3.211045  3.789474  3.911111   0.7650794   100
#  markus2 73.166667 53.176471 44.669822 50.263158 54.155556  10.4857143   100
#       mm  1.000000  1.000000  1.000000  1.000000  1.000000   1.0000000   100

On a 1000 times longer list:

mylist_long <- do.call(c,replicate(1000,mylist,simplify = F))
length(mylist_long) # [1] 4000

microbenchmark(
  r2evans = sapply(mylist_long, function(l) l[length(l)]),
  markus  = sapply(mylist_long, tail, 1),
  Rich1   = mapply("[", mylist_long, lengths(mylist_long)),
  Rich2   = unlist(mylist_long)[cumsum(lengths(mylist_long))],
  markus2 = vapply(mylist_long, tail, character(1), 1),
  mm      = .Internal(unlist(mylist_long,FALSE,FALSE))[cumsum(lengths(mylist_long,FALSE))],
  unit = "relative"
)
# Unit: relative
#     expr       min        lq      mean    median        uq       max neval
#  r2evans  26.14882  27.20436  27.07436  28.13731  28.54701  27.23846   100
#   markus 679.57251 698.84828 668.00160 715.30180 674.71067 443.42502   100
#    Rich1  27.53607  28.80581  29.82736  29.00353  31.02343  38.79978   100
#    Rich2  22.39863  21.79129  20.41467  21.53371  20.70750  13.03032   100
#  markus2 667.97494 702.14882 676.91881 718.41899 696.11934 633.17181   100
#       mm   1.00000   1.00000   1.00000   1.00000   1.00000   1.00000   100
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167