I'm writing a script that has to build a large matrix. I want to take a vector of names for each name get data from a different data frame do some operations on it, and then return a vector of data for that name. for example:
allNew=matrix(ncol=ncol(X)-1);
for(name in list)
{
tmpdata=all[grep(names,list$Names),];
data=(as.data.frame(apply(tmpdata[,2:(ncol(tmpdata)-1)],2,sum))==nrow(tmpdata))*1
colnames(data)=name;
data=t(data);
allNew=rbind(allNew,data);
}
the length of the names list is in the 10000 range, and for each name tmpdata has 1-5 rows. I'm running my code on my labs linux server with about 8 GB ram,
somehow I feel this is taking a lot longer than it should, it takes a few minutes. How can I do this more efficiently?