0

I have a customized function (psup2) that gets data from a data frame and returns a result. The problem is that it takes a while since I am using a "for" loop that runs for every row and column.

Input:

I have a table that contains the ages (table_costumers), an n*m matrix of different terms, and two different mortality tables (for males and females). The mortality tables i´m using contains one column for ages and another one for its corresponding survival probabilities.

Output:

I want to create a separate dataframe with the same size as that of the term table. The function will take the data from the different mortality tables (depending on the gender) and then apply the function above (psup2) taking the ages from the table X and the terms from the matrix terms.

Up to now I managed to create a very inefficient way to do this...but hopefully by using one of the functions from the apply family this could get faster.

The following code shows the idea of what I am trying to do:

#Function

psup2 <- function(x, age, term) {
  P1 = 1
  for (i in 1:term) {
    P <- x[age + i, 2]
    P1 <- P1*P
  }
  return(P1)
}

#Inputs

terms <- data.frame(V1 = c(1,2,3), V2 = c(1,3,4), V2 = c(2,3,4)) 
male<- data.frame(age = c(0,1,2,3,4,5), probability = c(0.9981,0.9979,0.9978,.994,.992,.99))
female <- data.frame(age = c(0,1,2,3,4,5), probability = c(0.9983,0.998,0.9979,.9970,.9964,.9950))
table_customers <- data.frame(id = c(1,2,3), age = c(0,0,0), gender = c(1,2,1))

#Loop

output <- data.frame(matrix(NA, nrow = 3, ncol = 0))
for (i in 1:3) {
  for (j in 1:3) {
    prob <- ifelse(table_customers[j, 3] == 1, 
                   psup2(male, as.numeric(table_customers[j, 2]), as.numeric(terms[j,i])), 
                   psup2(female, as.numeric(table_customers[j, 2]), as.numeric(terms[j,i])))
    output[j, i] <- prob
  }
}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Please look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and try and form your question in a way that can be tested by people who are willing to help you. I notice you have objects in your code here (male/female) that are not referenced above. Or how you don't provide your data in a form that is simple to run such as: `mortalitytable <- data.frame(age = c(0,1,2), probability = c(0.9981,0.9979,0.9978)) ; table_cusomers <- data.frame(id = c(1,2,3), age = c(56,57,61), gender = c(1,2,1)) ` – Evan Friedland Jun 14 '17 at 16:08
  • thanks, @EvanFriedland now i have modified it so it is ready to run. – tomas maldonado Jun 14 '17 at 16:42
  • You mentioned an n x m matrix but this is all n x n. This makes it difficult for me to understand what is `i` or `j` – Evan Friedland Jun 14 '17 at 18:26

1 Answers1

1

your psup function can be simplified into:

psup2 <- function(x, age, term) { prod(x$probability[age+(1:term)]) }

So actually, we won't use it, we'll use the formula directly. We'll put your male and female df next to each other, so we can use the value of the gender column to choose one or another.

mf <- merge(male,female,by="age") # assuming you have the same ages on both sides
input_df <- cbind(table_customers,terms) 
output <- t(apply(input_df,1,function(x){sapply(1:3,function(i){prod(mf[x["age"]+(1:x[3+i]),x["gender"]+1])})}))

And that's it :)

The sapply function is used to loop on the columns of terms.

x["age"]+(1:x[3+i]) are the indices of the rows you want to multiply

x["gender"]+1 is the relevant column of the mf data.frame

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167