4

I've got two long lists A and B which have the same length but contain different numbers of equivalent elements:
List A can contain many elements which also can recur in the same field.
List B either contains only one element or an empty field, i.e. "character(0)".
A also contains some empty fields but for these records there's always an element present in B, so there are no records with empty fields in A and B.
I want to combine the elements of A and B into a new list of the same length, C, according to the following rules:

  • All elements from A have to be present in C - including their potential recurrences in the same field.
  • If B contains an element which isn't already present in A of the same record it'll be added to C as well.
  • But if B contains an element which already is present in A of the same record it'll be ignored.
  • If A has an empty field the element from B for this record will be added to C.
  • If B has an empty field the element(s) from A for this record will be added to C.

This is an example of how these lists begin:

> A  
 [1] "JAMES" "JAMES"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"  
 [4] character(0)  
...  
> B  
 [1] "RICHARD"  
 [2] "JOHN"  
 [3] character(0)  
 [4] "CHARLES"  
...  

This is the correct output I'm looking for:

> C  
 [1] "JAMES" "JAMES" "RICHARD"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID" "WILLIAM"  
 [4] "CHARLES"  
... 

I tried, e.g.:

C <- sapply(mapply(union, A,B), setdiff, character(0))  

But this deleted the recurrences from A, unfortunately:

> C  
 [1] "JAMES" "RICHARD"  
 [2] "JOHN" "ROBERT"  
 [3] "WILLIAM" "MICHAEL" "DAVID"  
 [4] "CHARLES"  
...  

Can anybody tell me, please, how to combine these two lists, preserve the recurrences from A, and achieve the output I desire?

Thank you very much in advance!

Update: Machine readable data:

A <- list(c("JAMES","JAMES"),
          c("JOHN","ROBERT"), 
          c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),  
          character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
user0815
  • 115
  • 2
  • 8
  • 1
    Could you provide the data in such a way that other people could read it in? This would help them to get the example running and they have more time to find a good solution for you. Have a look here on how you could do that: [SO](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). In particular, have a look at `dput`. – Christoph_J Jun 14 '12 at 10:15
  • Thank you very much for your comment and advice! Next time I'll heed it as good as I can. – user0815 Jun 14 '12 at 12:19

1 Answers1

7

Here is your snippte of data, in reproducible form:

A <- list(c("JAMES","JAMES"),
          c("JOHN","ROBERT"), 
          c("WILLIAM","MICHAEL","WILLIAM","DAVID","WILLIAM"),  
          character(0))
B <- list("RICHARD","JOHN",character(0),"CHARLES")

You were close with mapply(). I got the desired output by using c() to concatenate the list elements in A and B but had to manipulate elements of the supplied vectors, so I came up with this:

foo <- function(...) {
    l1 <- length(..1)
    l2 <- length(..2)
    out <- character(0)
    if(l1 > 0) {
        if(l2 > 0) {
            out <- if(..2 %in% ..1)
                ..1
            else
                c(..1, ..2)
        } else {
            out <-  ..1
        }
    } else {
        out <-  ..2
    }
    out
}

We can refer to the individual elements of ... using the ..n placeholders; ..1 is A and ..2 is B. Of course, foo() only works with two lists but doesn't enforce this or do any checking, just to keep things simple. foo() also needs to handle the cases where either A or B or both are character(0) which I now think foo() does.

When we use that in the mapply() call I get:

> mapply(foo, A, B)
[[1]]
[1] "JAMES"   "JAMES"   "RICHARD"

[[2]]
[1] "JOHN"   "ROBERT"

[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID"   "WILLIAM"

[[4]]
[1] "CHARLES"

An lapply() version may be more meaningful than the abstract ..n but uses essentially the same code. Here is a new function that works with A and B directly but we iterate over the indices of the elements of A (1, 2, 3, length(A)) as generated by seq_along():

foo2 <- function(ind, A, B) {
    l1 <- length(A[[ind]])
    l2 <- length(B[[ind]])
    out <- character(0)
    if(l1 > 0) {
        if(l2 > 0) {
            out <- if(B[[ind]] %in% A[[ind]]) {
                A[[ind]]
            } else {
                c(A[[ind]], B[[ind]])
            }
        } else {
            out <- A[[ind]]
        }
    } else {
        out <- B[[ind]]
    }
    out
}

which is called like this:

> lapply(seq_along(A), foo2, A = A, B = B)
[[1]]
[1] "JAMES"   "JAMES"   "RICHARD"

[[2]]
[1] "JOHN"   "ROBERT"

[[3]]
[1] "WILLIAM" "MICHAEL" "WILLIAM" "DAVID"   "WILLIAM"

[[4]]
[1] "CHARLES"
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thank you very much for your solution which works perfectly! In addition to that thank you for your helpful edit/update to my question. – user0815 Jun 14 '12 at 12:18