1

This is my data:

# Test datasets
test_df <- data.frame(A =c(1, 2, 3, 3, 4), AKH_UL =c(111, 222, 333, 444, 555), AKH_LL = c(222, 333, 444, 555, 666),
                     AKH_UU = c(213, 242, 253, 546, 243), AKH_LU = c(453, 855, 784, 352, 585), FFL_UL =c(111, 222, 333, 444, 555), FFL_LL = c(222, 333, 444, 555, 666), FFL_UU = c(213, 242, 253, 546, 243), FFL_LU = c(453, 855, 784, 352, 585))

I want to create two columns, AKH and FFL. The value of the column depends on conditions and each condition is connected to a certain function:

Simplified functions:

# Case 1: 
myfunction1 <- function(cost, cost_LL, cost_UL, cost_LU, cost_UU){

  test_df$cost <- cost_LL *cost_UL + cost_UU * cost_LU
}

# Case 2:
myfunction2 <-function(cost,cost_LL, cost_LU){
 test_df$cost <- cost_LL *cost_LU 
}

# Case 3: 
myfunction3 <-function(cost,cost_UL, cost_UU){
  test_df$cost <- costUL *costUU
}


Right now I'm doing it in two separate steps for each column. For example like this for AKH:

test_df$AKH <-
  ifelse(test_df$A ==  1, 
      myfunction1(test_df$AKH, test_df$AKH_LL, test_df$AKH_UL, test_df$AKH_LU, test_df$AKH_UU),
        ifelse((test_df$A == 2, 
           myfunction2(test_df$AKH, test_df$AKH_LL, test_df$AKH_LU),
              ifelse((test_df$A == 3,
                 myfunction3(test_df$AKH, test_df$AKH_UL, test_df$AKH_UU),
                     99999))))

The same calculation I do for a second term, just with FFL instead of AKH inside the formulas.

This looks pretty horrible (in rl it's not only two but 10 columns) and I'm afraid the other children will make fun of me when they see my script.

I checked out this question, but wasn't able to fully transfer it to my problem, because I have no idea how to make the connection between the variable name in the formula and the column name.

Jana Keller
  • 107
  • 7

2 Answers2

2

Replace test_df$AKH for test_df[,column_names[i]] and use for and paste functions.

Example:

column_names <- c("AKH", "FFL")

for(i in 1:length(column_names){

test_df[,column_names[i]]<-
 ifelse(test_df$A ==  1, 
     myfunction1(test_df[,column_names[i]]
                 test_df[,paste(column_names[i],"LL",sep = "_")], 
                 test_df[,paste(column_names[i],"UL",sep = "_")], 
                 test_df[,paste(column_names[i],"LU",sep = "_")], 
                 test_df[,paste(column_names[i],"UU",sep = "_")]),
      ifelse((test_df$A == 2, 
          myfunction2(test_df[,column_names[i]], 
                      test_df[,paste(column_names[i],"LL",sep = "_")],
                      test_df[,paste(column_names[i],"LU",sep = "_")]),
             ifelse((test_df$A == 3,
                myfunction3(test_df[,column_names[i]],
                            test_df$[,paste(column_names[i],"UL",sep = "_")], 
                            test_df$[,paste(column_names[i],"UU",sep = "_")]),
                            99999))))

}

Brutalroot
  • 311
  • 3
  • 8
1

Your simplified functions don't make use of cost anywhere, so you can probably omit these. Also, you don't really need to update the data frame within the functions; they should just return the calculated value (your costs). So your functions should look like this:

# Case 1: 
myfunction1 <- function(cost_LL, cost_UL, cost_LU, cost_UU) {
  cost_LL * cost_UL + cost_UU * cost_LU
}

# Case 2:
myfunction2 <- function(cost_LL, cost_LU) {
  cost_LL * cost_LU 
}

# Case 3: 
myfunction3 <-function(cost_UL, cost_UU) {
  cost_UL * cost_UU
}

I also fixed the naming inconsistencies.

To calculate the costs based on the "A" variable, I reshaped the data frame using the tidyr package so that you have only five columns, the "A" column and the four "cost" columns that are used in your functions (UL, LL, UU and LU). This way you can have as many sets of variables as you like. Then, using mapply, I create the cost column.

library(tidyr)
library(dplyr)

test_df %>%
  pivot_longer(cols=-A, names_to=c("ID", ".value"),
               names_pattern="(.+)_([U|L][L|U])") %>% # Run to here to see the result
  group_by(A, ID) %>%
  transmute(cost=ifelse(A==1, mapply(myfunction1, LL, UL, LU, UU),
                     ifelse(A==2, mapply(myfunction2, LL, LU),
                            ifelse(A==3, mapply(myfunction3, UL, UU), NA))))

# A tibble: 10 x 3
# Groups:   A, ID [8]
       A ID      cost
   <dbl> <chr>  <dbl>
 1     1 AKH   121131
 2     1 FFL   121131
 3     2 AKH   284715
 4     2 FFL   284715
 5     3 AKH    84249
 6     3 FFL    84249
 7     3 AKH   242424
 8     3 FFL   242424
 9     4 AKH       NA
10     4 FFL       NA

This should work on any number of variable sets.


Edit: To obtain the costs in wide format, we need to add an id variable first because of the duplicates in the "A" column. The rest is similar to the above except the last line pivots the result into a wide form for each variable.

test_df %>%
  group_by(A) %>%
  mutate(id=row_number()) %>%
  pivot_longer(cols=-c(A,id), names_to=c("ID", ".value"),
               names_pattern="(.+)_([U|L][L|U])") %>% # Run to here to see the result
  group_by(A, id, ID) %>%
  transmute(cost=ifelse(A==1, mapply(myfunction1, LL, UL, LU, UU),
                        ifelse(A==2, mapply(myfunction2, LL, LU),
                               ifelse(A==3, mapply(myfunction3, UL, UU), NA)))) %>%
  pivot_wider(id_cols=c(A, id), names_from=ID, values_from = cost)

# A tibble: 5 x 4
# Groups:   A, id [5]
      A    id    AKH    FFL
  <dbl> <int>  <dbl>  <dbl>
1     1     1 121131 121131
2     2     1 284715 284715
3     3     1  84249  84249
4     3     2 242424 242424
5     4     1     NA     NA
Edward
  • 10,360
  • 2
  • 11
  • 26
  • Many thanks for your answer! I need the output in wide format eventually, to rbind it with another dataset. I know I can reshape it afterwords, but I would prefer a way without this step in between if it possible. I will try to adapt your solution to work like this. – Jana Keller Mar 20 '20 at 14:41
  • That's very possible but I'll have to add an id variable.... check my edit.... – Edward Mar 20 '20 at 15:04