10

I have a list of character vectors:

my.list <- list(e1 = c("a","b","c","k"),e2 = c("b","d","e"),e3 = c("t","d","g","a","f"))

And I'm looking for a function that for any character that appears more than once across the list's vectors (in each vector a character can only appear once), will only keep the first appearance.

The result list for this example would therefore be:

res.list <- list(e1 = c("a","b","c","k"),e2 = c("d","e"),e3 = c("t","g","f"))

Note that it is possible that an entire vector in the list is eliminated so that the number of elements in the resulting list doesn't necessarily have to be equal to the input list.

dan
  • 6,048
  • 10
  • 57
  • 125

3 Answers3

14

We can unlist the list, get a logical list using duplicated and extract the elements in 'my.list' based on the logical index

un <- unlist(my.list)
res <- Map(`[`, my.list, relist(!duplicated(un), skeleton = my.list))
identical(res, res.list)
#[1] TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662
4

Here is an alternative using mapply with setdiff and Reduce.

# make a copy of my.list
res.list <- my.list
# take set difference between contents of list elements and accumulated elements
res.list[-1] <- mapply("setdiff", res.list[-1],
                                  head(Reduce(c, my.list, accumulate=TRUE), -1))

Keeping the first element of the list, we compute on subsequent elements and the a list of the cumulative vector of elements produced by Reduce with c and the accumulate=TRUE argument. head(..., -1) drops the final list item containing all elements so that the lengths align.

This returns

res.list
$e1
[1] "a" "b" "c" "k"

$e2
[1] "d" "e"

$e3
[1] "t" "g" "f"

Note that in Reduce, we could replace c with function(x, y) unique(c(x, y)) and accomplish the same ultimate output.

lmo
  • 37,904
  • 9
  • 56
  • 69
1

I found the solutions here very complex for my understanding and sought a simpler technique. Suppose you have the following list.

my_list <- list(a = c(1,2,3,4,5,5), b = c(1,2,2,3,3,4,4), 
                
                d = c("Mary", "Mary", "John", "John"))

The following much simpler piece of code removes the duplicates.

sapply(my_list, unique)

You will end up with the following.

$a
[1] 1 2 3 4 5

$b
[1] 1 2 3 4

$d
[1] "Mary" "John"

There is beauty in simplicity!

John Karuitha
  • 331
  • 3
  • 11