1

I have a vector of lists (effectively a 2D array). The lists contain certain IDs and the number of IDs vary from list to list. I want to sort the vector based on the lists (first ID -> second ID ->.. and so on). Also I want to find the number of duplicates occurring in the vector. (Duplicates would be same IDs in separate lists in any permutation). For example:

vec = c( list(c(1,2)),list(c(1,2,3)),list(c(1,2)),list(c(2,3)),list(c(1,3,2)) )
vec
[[1]]
[1] 1 2

[[2]]
[1] 1 2 3

[[3]]
[1] 1 2

[[4]]
[1] 2 3

[[5]]
[1] 1 3 2

I want the output to sort the lists and provide the number of duplicates. Hence, the output must be in the order: [[1]] -> [[2]] -> [[4]] with frequencies (2,2,1).

  • 1
    Could you provide a small reproducible example and expected output based on that. – akrun Dec 21 '15 at 10:54
  • 2
    Please consider reading up on [ask] and how to produce a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Heroka Dec 21 '15 at 10:54
  • It is not clear about the output. Why the second element 1 2 3 comes before 2 3 i.e. 4th element, also the 5th element 1 3 2 I guess should be sorted to get the frequency? – akrun Dec 21 '15 at 11:35
  • First, the duplicates are deleted. Hence 1 3 2 gets deleted as it has the same elements as 1 2 3. We are left with [[1]],[[2]] & [[4]]. Then we sort ascending by order of elements (compare all first elements, then second and then 3rd). Hence the order. – Ankur Lahiri Dec 21 '15 at 11:48

1 Answers1

1

We can try

l1 <- lapply(vec, sort)
l2 <- l1[!duplicated(l1)]
l3 <- lapply(l2, `length<-`, min(lengths(l2)))
i4 <- order(as.numeric(sapply(l3, paste, collapse='')))
l2[i4]

To get the frequencies

table(sapply(l1, paste, collapse=''))[i4]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks a lot. What about calculating the number of duplicates for each unique entry? – Ankur Lahiri Dec 21 '15 at 12:32
  • @James1991 Updated the post, perhaps it helps. – akrun Dec 21 '15 at 12:45
  • Thanks again. This helps a lot. One last thing. What if the list contains alphanumeric values instead of only numeric? Then as.numeric() would return NAs. However I would still need it sorted based on the list order. Many thanks again. @akrun – Ankur Lahiri Dec 21 '15 at 12:54
  • @James1991 In that case use `library(gtools)` and instead of converting to 'numeric` try `i4 <- mixedorder(sapply(l3, paste, collapse=''))` – akrun Dec 21 '15 at 12:56
  • Following up with a similar type of question, how do I write this vector of lists from R to a csv file? Thanks in advance. – Ankur Lahiri Dec 22 '15 at 10:44
  • @James1991 If the length of list elements are not the same, one way is `invisible(capture.output(l2[ir], 'yourfile.csv'))` – akrun Dec 22 '15 at 10:48