I am new to R. I'm trying to write a function that will perform calculations on a column vector by groups and bind the results to the data table as a new column.
I've written a simple version of what I will eventually implement, testing just one column of values. (Eventually there will be several columns of values to be evaluated, likely in a loop inside the function). Once all the calculations are complete across multiple columns of values, I will reduce the data to the distinctive bygroups.
Because I'd like to retain all rows until the calculations across all columns are complete, I do not want to use the function aggregate. And, because I will be working with millions of records, I would like to work with data.table functions.
The code I've written to evaluate one column works outside the function but not inside the function. I am receiving this error when I try to run the function:
"Error in eval(bysub, xss, parent.frame()) : object 'id' not found"
Traceback indicates the same problem the error messages points to:
5. eval(bysub, xss, parent.frame())
4. eval(bysub, xss, parent.frame())
I have read several other entries for this error but I was not able to see how they applied to my problem. Could you please help this novice?
-- Here is a simple data sample. "phase" refers to a time period, "type" is a string var referring to a type of test, "ind_abc" and "ind_def" flag the presence of the abc or def string vars in "type", and "value" refers to the result of ab c or def tests.
tests <- c("abc", "abc", "abc", "def", "def", "def", "abc", "abc", "abc", "abc", "def","abc", "abc","def", "abc", "abc", "abc", "abc")
vec1 <- c(101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 103, 103, 103, 103, 104, 104, 104, 104)
vec2 <- c(0,0,0,1,0,0,1,1,0,0,1,1,0,0,0,1,1)
vec3 <- c(1,1,1,0,0,0,1,1,1,1,0,1,1,0,1,1,1,1)
vec4 <- c(0,0,0,1,1,1,0,0,0,0,1,0,0,1,0,0,0)
vec5 <- runif(18, min=3, max = 20)
mydata <-data.frame(cbind(vec1, vec2, tests, vec3, vec4, vec5))
colnames(x=mydata) <- c("id", "phase", "type", "ind_abc", "ind_def", "value")
mydataDT <- setDT(mydata)
-- My function using data.table syntax
critical_values <- function(dt, categ, result, id_var, time_var) {
indicator <- paste("ind_", "categ", sep="")
new <- paste("critical_", "categ", sep="")
paste("dt", "2", sep="") <- dt[indicator ==1, new:= max(result), by=.(id_var, time_var)]
return(paste("dt", 2, sep=""))
}
critical_values(mydataDT, type, value, id, phase)
-- Testing the inner function - this works
temp2dt <- mydataDT
new <- paste("critical_", "categ", sep="")
temp2dt[ind_a1c ==1, new:= max(value), by=.(id, phase)]