Here is a similar approach to nested loops that uses lapply()
.
If you have a large dataset, using lapply()
may gain you very considerable speed improvements over using loops. Loops are slow in R, and it is recommended to use vectorized functions in the *apply
family where possible.
I'll walk through an example and you can perhaps adapt it to your dataset.
First, we make a sample 3x3 data frame called df
, with columns a
, b
and c
, and rows d
, e
and f
:
> df <- data.frame(a = sample(3), b = sample(3), c = sample(3))
> rownames(df) <- c('d','e','f')
Let's look at df
and its transpose t(df)
:
> df
a b c
d 3 1 3
e 1 3 1
f 2 2 2
> t(df)
d e f
a 3 1 2
b 1 3 2
c 3 1 2
Let's say we want to intersect
the column vectors of df
and t(df)
. We now use nested lapply()
statements to run intersect()
on column vectors from both df
and the transpose t(df)
:
> result <- lapply(df, function(x) lapply(as.data.frame(t(df)), function(y) intersect(x,y)))
The results are a list()
, showing the intersection results:
> is.list(result)
[1] TRUE
> print(result)
$a
$a$d
[1] 3 1
$a$e
[1] 3 1
$a$f
[1] 2
$b
$b$d
[1] 1 3
$b$e
[1] 1 3
$b$f
[1] 2
$c
$c$d
[1] 3 1
$c$e
[1] 3 1
$c$f
[1] 2
Let's look at df
and t(df)
again, and see how to read these results:
> df
a b c
d 3 1 3
e 1 3 1
f 2 2 2
> t(df)
d e f
a 3 1 2
b 1 3 2
c 3 1 2
Let's look at df$a
intersected with t(df)$d
, t(df)$e
and t(df)$f
:
$a
$a$d
[1] 3 1
Intersecting the vectors a
and d
: {3,1,2}^{3,1,3} = {3,1}
$a$e
[1] 3 1
Again, with vectors a
and e
: {3,1,2}^{1,3,1} = {3,1}
$a$f
[1] 2
And lastly, with vectors a
and f
: {3,1,2}^{2,2,2} = {2}
The other items in result
follow.
To extend this to your dataset, think of your data-frame columns as localities, and the transposed-data-frame columns as your species. Then use lapply()
, as shown above.
To break down the nested lapply()
statement, start with the inner lapply()
:
lapply(as.data.frame(t(df)), function(y) ... )
What this means is that every column vector in t(df)
— the columns $d, $e and $f — are represented by the variable y
in function(y)
. We'll come back to ...
in a second.
Now let's look at the outer lapply()
:
lapply(df, function(x) ... )
What this means is that every column vector in df
— columns $a, $b and $c — are represented by variable x
in function(x)
.
Now let's explain ...
.
The outer ...
is any function of x
— this can be length()
, sum()
, etc. and even another lapply()
. The inner lapply()
has its own function and variable name y
, and so the inner ...
can run a function on both x
and y
.
So that's what we do: For every column vector in df
, we run a function on that df
-vector and every column vector in the transpose t(df)
. In our example, the function we will run on x
and y
is intersect()
:
> result <- lapply(df, function(x) lapply(as.data.frame(t(df)), function(y) intersect(x,y)))