0

I have a list of 10,000 vectors, and each vector might have different elements and different lengths. I would like to know how many unique vectors I have and how often each unique vector appears in the list.

I guess the way to go is the function "unique", but I don't know how I could use it to also get the number of times each vector is repeated.

So what I would like to get is something like that:

"a" "b" "c" d" 301

"a" 277

"b" c" 49

being the letters, the contents of each unique vector, and the numbers, how often are repeated.

I would really appreciate any possible help on this.

thank you very much in advance.

Tina.

user18441
  • 643
  • 1
  • 7
  • 15

1 Answers1

1

Maybe you should look at table:

Some sample data:

myList <- list(A = c("A", "B"),
               B = c("A", "B"),
               C = c("B", "A"),
               D = c("A", "B", "B", "C"),
               E = c("A", "B", "B", "C"),
               F = c("A", "C", "B", "B"))

Paste your vectors together and tabulate them.

table(sapply(myList, paste, collapse = ","))
# 
#     A,B A,B,B,C A,C,B,B     B,A 
#       2       2       1       1 

You don't specify whether order matters (that is, is A, B the same as B, A). If it does, you can try something like:

table(sapply(myList, function(x) paste(sort(x), collapse = ",")))
# 
#     A,B A,B,B,C 
#       3       3 

Wrap this in data.frame for a vertical output instead of horizontal, which might be easier to read.


Also, do be sure to read How to make a great R reproducible example? as already suggested to you.

As it is, I'm just guessing at what you're trying to do.

Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Can I ask you if `rle(sort(sapply(myList, paste, collapse = ",")))` would be slower or faster? TIMTOWDI in R is killig me... [I can always try tomorrow with some toy examples, yes...but maybe it's trivial] – vodka Apr 04 '13 at 18:15
  • @vodka, no idea. Try running some benchmarks with the rbenchmark or microbenchmark packages. – A5C1D2H2I1M1N2O1R2T1 Apr 04 '13 at 18:17
  • faster with table: 15.734 vs 20.212 on a list of 6000 elements. – vodka Apr 06 '13 at 10:08
  • @vodka, thanks for testing. There are indeed a lot of different ways to do things in R, but that is something I *like* about R. Don't know if I would lose any sleep over 5 seconds though :) – A5C1D2H2I1M1N2O1R2T1 Apr 10 '13 at 06:56
  • I was fond of perl's TIMTOWTDIness...right now sometimes I feel a little lost as long as I'm just starting with R and maybe due to the different approach it has, in general, with respect to standard imperative/oop programming . In this case the difference is not that big (with 6000 elements) but I'm trying to learn all the performance issues because I saw examples of huge differences around. Thank you for your help! – vodka Apr 10 '13 at 10:45