Possible Duplicate:
Why are loops slow in R?
Consider the following task. A dataset has 40 variables for 20,000 "users". Each user has between 1 and 150 observations. All users are stacked in a matrix called data. The first column is the id of the user and identifies the user. All id are stored in a 20,000 X 1 matrix called userid.
Consider the following R code
useridl = length(userid)
itime=proc.time()[3]
for (i in 1:useridl) {
temp =data[data[,1]==userid[i],]
}
etime=proc.time()[3]
etime-itime
This code just goes through the 20,000 users, creating the temp matrix every time. With the subset of observations belonging to userid[i]. It takes about 6 minutes in a MacPro.
In MatLab, the same task
tic
for i=1:useridl
temp=data(data(:,1)==userid(i),:);
end
toc
takes 1 minute.
Why is R so much slower? This is standard task, I am using matrices in both cases. Any ideas?