I am trying to do a calculation on each row of a data frame in R and append the calculation as a new column in the frame. I started using the "by" function but it was incredibly slow doing the calculation so I switched to the "apply" function instead. The way I imagine it will work is by running apply with my function, saving the output to a variable and appending that data to the original data frame.
I created a function to calculate the term length of an insurance plan and return that value, which works fine on a sample data set. When I use the my larger data set, I get an error of "cannot allocate vector of size ... ". I know many people recommend getting more RAM but I already have 16GB of memory and with the entire data set loaded in R my computer says it is using only 7.7GB of memory. The data set has 44 columns with ~11 million records so I'm not seeing how adding one more column of data takes up 8GB of memory?
Any point in the right direction would be great.
Below is the function I am using:
get_term_length <- function(row_data){
# convert values to dates
expiration_date <- as.Date( row_data[42] )
start_date <- as.Date( row_data[43] )
cancellation_date <- as.Date( row_data[44] )
# check to see if the cancellation date is NA - just use entire policy length
if( is.na(cancellation_date) ){
return( expiration_date - start_date) )
}
# check to see if policy was cancelled early
if(cancellation_date < expiration_date){
return( cancellation_date - start_date )
}
# the policy was for the entire term
else{
return( expiration_date - start_date )
}
}
I have been running the function by calling:
tmp <- apply(policy_data, 1, get_term_length)