I've got some data involving repeated sales for a bunch of of cars with unique Ids. A car can be sold more than once.
Some of the Ids are erroneous however, so I'm checking, for each Id, if the size is recorded as the same over multiple sales. If it isn't, then I know that the Id is erroneous.
I'm trying to do this with the following code:
library("doMC")
Data <- data.frame(ID=c(15432,67325,34623,15432,67325,34623),Size=c("Big","Med","Small","Big","Med","Big"))
compare <- function(v) all(sapply( as.list(v[-1]), FUN=function(z) {isTRUE(all.equal(z, v[1]))}))
IsGoodId = function(Id){
Sub = Data[Data$ID==Id,]
if (length(Sub[,1]) > 1){
return(compare(Sub[,"Size"]))
}else{
return(TRUE)
}
}
WhichAreGood = mclapply(unique(Data$ID),IsGoodId)
But it's painfully, awfully, terribly slow on my quad-core i5.
Can anyone see where the bottleneck is? I'm a newbie to R optimisation.
Thanks, -N