1

I'm working with R and I have a code like this:

for (i in 1:10)
   for (j in 1:100)
        if (data[i] == paths[j,1])
            cluster[i,4] <- paths[j,2]

where:

  • data is a vector with 100 rows and 1 column
  • paths is a matrix with 100 rows and 5 columns
  • cluster is a matrix with 100 rows and 5 columns

My question is: how could I avoid the use of "for" loops to iterate through the matrix? I don't know whether apply functions (lapply, tapply...) are useful in this case.

This is a problem when j=10000 for example because the execution time is very long.

Thank you

Quinten
  • 35,235
  • 5
  • 20
  • 53
albergali
  • 191
  • 1
  • 1
  • 4
  • I think something has been lost in translation here? – wkmor1 Jun 02 '10 at 11:54
  • 1
    Do you intend really to save in 'cluster' the last matching 'paths'? – teucer Jun 02 '10 at 11:59
  • Yes, what Musa and wkmor1 said... Did you really mean for i to got to 10... only testing the first 10 items in the 100 item vector data? --- The general answer to your question is that you have to start thinking in vectors rather than individual items. There are vastly faster ways to do something like what you're doing as soon as it makes sense. – John Jun 02 '10 at 14:02
  • thank you guys. What i want to do is save in cluster column 4 values of path 2nd column when values from "data" are equal to values from "paths" and avoid the "for" sentence because when i have a lot of observations computational time increase highly – albergali Jun 02 '10 at 19:28
  • 1
    And what do you want to do when they are not equal?... they're just ignored?... or is the cluster column 4 already set to something that doesn't change unless this condition is met. (sounds like a simple ifelse() command-- no loops-- check help) – John Jun 02 '10 at 21:52
  • @albergali In last line of your code should be there `cluster[j,4] <- paths[j,2]` – Marek Jun 03 '10 at 22:56

2 Answers2

1

Inner loop could be vectorized

cluster[i,4] <- paths[max(which(data[i]==paths[,1])),2]

but check Musa's comment. I think you indented something else.

Second (outer) loop could be vectorize either, by replicating vectors but

  1. if i is only 100 your speed-up don't be large
  2. it will need more RAM

[edit] As I understood your comment can you just use logical indexing?

indx <- data==paths[, 1]
cluster[indx, 4] <- paths[indx, 2]
Marek
  • 49,472
  • 15
  • 99
  • 121
1

I think that both loops can be vectorized using the following:

cluster[na.omit(match(paths[1:100,1],data[1:10])),4] = paths[!is.na(match(paths[1:100,1],data[1:10])),2]
gd047
  • 29,749
  • 18
  • 107
  • 146
  • I wonder how the performance of your vectorized solution compares to the looping alternative. – Guido Jun 04 '10 at 06:59
  • @Guido In this particular case it's hard to say cause results from original loop and gd047 solution differ, but in general difference between loop and vectorized code could be huge. Check my answer to http://stackoverflow.com/questions/2908822/speed-up-the-loop-operation-in-r, where from hours you can go to less than second. – Marek Jun 04 '10 at 19:40
  • @Marek Using randomized test matrices I got equal cluster matrices using both methods. I checked the results using `all.equal(loop_sol,vect_sol)` Which are the the test matrices that you have used and gave you different results? – gd047 Jun 04 '10 at 20:25
  • @gd047 Check this http://sites.google.com/site/fsh9rss8heh/ (too long for comment), I use R-2.10.1 – Marek Jun 04 '10 at 22:15
  • @Marek Thanks. You are right. In my examples there were not more than one matches between data[i] and paths[j,1]. In the general case where there are more than one, the dominant is the one that is checked last. I am not sure which one dominates in the vectorized way. Do you have any idea? – gd047 Jun 05 '10 at 06:17
  • @gd047 As states in `help("match")` return **positions of (first) matches**, so you could write you own version using rev `match_last <- function(x,y) length(y)-match(x,rev(y))+1` – Marek Jun 05 '10 at 12:24
  • @Marek Nice idea but there's still a difference. – gd047 Jun 05 '10 at 14:28