I'm fairly new to R and I'm coming from a C++ background, so I have a tendency to use for-loops but this seems to very slow in R. Here is a particular example:
dat1 <- cbind(dat1, data.frame(tot.hh = 0, below.18 = 0, above.18 = 0, below.65 = 0, above.65 = 0))
for(i in 1:length(dat1$hb030)){
tmp <- subset(dat2, dat2$hb030 == dat1[i,]$hb030 & dat2$hb020 == dat1[i,]$hb020)
dat1[i,]$tot.hh <- nrow(tmp)
for(j in 1:length(tmp)){
tmp.age <- 2006 - tmp[j,]$rb080
ifelse(tmp.age<18, dat1[i,]$below.18 <- dat1[i,]$below.18+1, dat1[i,]$above.18 <- dat1[i,]$above.18+1)
ifelse(tmp.age<65, dat1[i,]$below.65 <- dat1[i,]$below.65+1, dat1[i,]$above.65 <- dat1[i,]$above.65+1)
}
}
The idea here is that there is one data set of households and one of personal data of individuals in the household and I'm trying to add information to households like how many members and their ages. My code works but takes forever (more than an hour for what is a fairly trivial computation). There are also some obvious inefficiencies like the subsetting but I haven't found a better way of doing this for now. I'm wondering if there is a vectorized approach to these kind of problems.