I have a matrix of many million values. One column is a weirdly formatted date, which I am converting to an actual datetime that I can sort.
I want to speed this up and do it in parallel. I've had success doing minor things before in Parallel, but that was easy because I wasn't actively changing an existing matrix.
How do I do this in parallel? I can't seem to figure it out...
The code I want to parallelize is...
len = dim(combinedDF)[1]
for(j in 1:len)
{
sendTime = combinedDF[j, "tweetSendTime"]
sendTime = gsub(" 0000", " +0000", sendTime)
updatedTime = strptime( sendTime, "%a %b %d %H:%M:%S %z %Y")
combinedDF[j, "tweetSendTime"] = toString(updatedTime)
}
EDIT : I was told to also try apply. I tried...
len = dim(combinedDF)[1]
### Using apply
apply(combinedDF,1, function(combinedDF,y){
sendTime = combinedDF[y, "tweetSendTime"]
sendTime = gsub(" 0000", " +0000", sendTime)
updatedTime = strptime( sendTime, "%a %b %d %H:%M:%S %z %Y")
combinedDF[y, "tweetSendTime"] = toString(updatedTime)
combinedDF[y,]
}, y=1:len)
However that nets an error when the }, processes, giving me "Error in combinedDF[y,"tweetSendTime"] -- incorrect number of dimensions.
Edit :
updateTime = function(timeList){
sendTime = timeList
sendTime = gsub(" 0000", " +0000", sendTime)
updatedTime = strptime( sendTime, "%a %b %d %H:%M:%S %z %Y")
toString(updatedTime)
}
apply(as.matrix(combinedDF[,"tweetSendTime"]),1,updateTime)
Seems to work