Loop of a loop in R

Question

This is my problem: I have two different data frames (A and B). The column of each data frame is a geographical locality and the row data are the species in a locality. I need to intersect the list of species of the locality 1 of the data fame A with the list of species of all the localities of the data frame B. To do this I wrote a loop like this:

res<-list()
for(i in 1:length(B)) {intersect(A[1], B[i])->res[[i]]
}

Now I have to repeat the same loop for the locality 2, 3, 4, 5,6,..... of A, that is to say I have to intersect all the localities of A with all the localities of B.

Thank you.

Welcome to SO. Questions are easier to answer if you provide sample data. — Richie Cotton, Jan 11 '12 at 10:48

score 5 · Answer 1 · edited May 23 '17 at 12:29

Here is a similar approach to nested loops that uses lapply().

If you have a large dataset, using lapply() may gain you very considerable speed improvements over using loops. Loops are slow in R, and it is recommended to use vectorized functions in the *apply family where possible.

I'll walk through an example and you can perhaps adapt it to your dataset.

First, we make a sample 3x3 data frame called df, with columns a, b and c, and rows d, e and f:

> df <- data.frame(a = sample(3), b = sample(3), c = sample(3))
> rownames(df) <- c('d','e','f')

Let's look at df and its transpose t(df):

> df
  a b c
d 3 1 3
e 1 3 1
f 2 2 2

> t(df)
  d e f
a 3 1 2
b 1 3 2
c 3 1 2

Let's say we want to intersect the column vectors of df and t(df). We now use nested lapply() statements to run intersect() on column vectors from both df and the transpose t(df):

> result <- lapply(df, function(x) lapply(as.data.frame(t(df)), function(y) intersect(x,y)))

The results are a list(), showing the intersection results:

> is.list(result)
[1] TRUE

> print(result)
$a
$a$d
[1] 3 1

$a$e
[1] 3 1

$a$f
[1] 2


$b
$b$d
[1] 1 3

$b$e
[1] 1 3

$b$f
[1] 2


$c
$c$d
[1] 3 1

$c$e
[1] 3 1

$c$f
[1] 2

Let's look at df and t(df) again, and see how to read these results:

> df
  a b c
d 3 1 3
e 1 3 1
f 2 2 2

> t(df)
  d e f
a 3 1 2
b 1 3 2
c 3 1 2

Let's look at df$a intersected with t(df)$d, t(df)$e and t(df)$f:

$a
$a$d
[1] 3 1

Intersecting the vectors a and d: {3,1,2}^{3,1,3} = {3,1}

$a$e
[1] 3 1

Again, with vectors a and e: {3,1,2}^{1,3,1} = {3,1}

$a$f
[1] 2

And lastly, with vectors a and f: {3,1,2}^{2,2,2} = {2}

The other items in result follow.

To extend this to your dataset, think of your data-frame columns as localities, and the transposed-data-frame columns as your species. Then use lapply(), as shown above.

To break down the nested lapply() statement, start with the inner lapply():

lapply(as.data.frame(t(df)), function(y) ... )

What this means is that every column vector in t(df) — the columns $d, $e and $f — are represented by the variable y in function(y). We'll come back to ... in a second.

Now let's look at the outer lapply():

lapply(df, function(x) ... )

What this means is that every column vector in df — columns $a, $b and $c — are represented by variable x in function(x).

Now let's explain ....

The outer ... is any function of x — this can be length(), sum(), etc. and even another lapply(). The inner lapply() has its own function and variable name y, and so the inner ... can run a function on both x and y.

So that's what we do: For every column vector in df, we run a function on that df-vector and every column vector in the transpose t(df). In our example, the function we will run on x and y is intersect():

> result <- lapply(df, function(x) lapply(as.data.frame(t(df)), function(y) intersect(x,y)))

This is a very interesting procedure. This is the first way I experimented but only now I can understand how lapply works. The explanation is very useful. Thanks — user1142777, Jan 11 '12 at 13:16

DrDom · Answer 2 · 2012-01-11T15:18:17.467

It's difficulty to fully understand what you want to obtain as a result. But if I guessed correctly your needs the code below will do what you want. This code can be further optimized of course to improve speed, because for big datasets it may work not too fast.

res <- list()
for (i in 1:ncol(A)) {
  res[[i]] <- list()
  for (j in 1:ncol(B)) {
    res[[i]][[j]] <- intersect(A[,i], B[,j])
  }
}

To access result you can use

res[[column_index_in_A]][[column_index_in_B]]

score 1 · Answer 3 · answered Jan 11 '12 at 11:57

Here's a wild guess at your data:

A <- data.frame(
  London     = c(TRUE, TRUE, FALSE),
  Manchester = c(FALSE, TRUE, FALSE),
  Birmingham = c(TRUE, FALSE, TRUE),  
  row.names  = c("rats", "mice", "foxes")
)

B <- data.frame(
    London     = c(TRUE, FALSE, FALSE),
    Manchester = c(TRUE, TRUE, TRUE),
    Birmingham = c(TRUE, TRUE, FALSE),
    row.names  = c("rats", "mice", "foxes")    
)


> A
      London Manchester Birmingham
rats    TRUE      FALSE       TRUE
mice    TRUE       TRUE      FALSE
foxes  FALSE      FALSE       TRUE
> B
      London Manchester Birmingham
rats    TRUE       TRUE       TRUE
mice   FALSE       TRUE       TRUE
foxes  FALSE       TRUE      FALSE

In this case, to find species that exist in the same location in both datasets, you just need

as.matrix(A) & as.matrix(B)

Thanks, but I need to focuse the attention on the intersection of faunal lists as I need to compute a similariry index — user1142777, Jan 11 '12 at 13:08

Loop of a loop in R

3 Answers3