Here's a solution using dplyr, which is great for handling this kind of problem. I created some simulated data matching your example:
library(dplyr)
## fake test data set
combo.test <- data.frame(
CUSTID = sample(rep(10000:999999, each=2), 800000, replace = F),
typeGas = sample(c(0,1), 800000, replace = T)
)
combo.test$typeElec <- ifelse(combo.test$typeGas == 0, 1, 0)
To assign "1" to typeTest if a customer is 1 for both typeElec and typeGas in (possibly) different rows, you use the dplyr "group_by" function to loop over each distinct CUSTID in your data.frame, then "mutate" to create a new variable "typeTest". "ifelse" tests if "any" values are 1 in either the typeElec or typeGas column for that CUSTID.
# convert to tbl_df object, arrange by CUSTID, assign 1 to variable typeTest
# if CUSTID has values for 1 in both typeGas and typeElec
ptm <- proc.time()
combo.test <- combo.test %>% tbl_df() %>% arrange(CUSTID) %>%
group_by(CUSTID) %>%
mutate(typeTest = ifelse(any(typeGas == 1) & any(typeElec == 1), 1, 0)) %>%
ungroup()
proc.time() - ptm
"tbl_df()" converts the data.frame to a nice dplyr version, and the pipe "%>%" operators denote the output from each function is passed to the next. The code took ~ 10 sec to run for me.
https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
UPDATE: right, I should have answered your original question instead of giving an alternative method. There was only one bug in your function (line 3 should've indexed column 1 instead of column 2, for CUSTID). The speed problem has to do with the efficiency with which R handles vectors vs. data.frames. Here's a good discussion: (Speed up the loop operation in R).
elecOrGas2 <-function(myData) {
res <- numeric(nrow(myData)) # initialize a vector for 'typeTest'
for (i in 1:(nrow(myData)-1)) {
#if (myData[i,2]==myData[i+1,2])
if (myData[i,1]==myData[i+1,1]) { # correct index for CUSTID
if ((myData$typeGas[i]==myData$typeElec[i+1])|
(myData$typeElec[i]==myData$typeGas[i+1])) {
res[i] <- 1 # use
#myData$typeTest[i]=1
} else {
res[i]=0
}
} else {
res[i]=0
}
}
myData$typeTest <- res
return(myData)
}
library(dplyr)
combo.test <- data.frame(
CUSTID = sample(rep(10000:999999, each=2), 800000, replace = F),
typeGas = sample(c(0,1), 800000, replace = T)
)
combo.test$typeElec <- ifelse(combo.test$typeGas == 0, 1, 0)
combo.test <- arrange(combo.test, CUSTID) %>% tbl_df()
# test time using 1/10 of the data
# original function: 29 sec
system.time(elecOrGas(combo.test[1:80000,]) -> test1)
# updated vectorized function: 6 sec
system.time(elecOrGas2(combo.test[1:80000,]) -> test2)