R Calculate the difference between values from one to all the other columns

Question

i am having some troubles to find the best and most efficient way to perform few calculations on my data in shiny app. I would like to at first calculate the difference between all the columns (except ID) to one column, create new columns for each (with specific name) and furthermore perform small calculation. I am going to explain it on the example data:

data <- structure(list(ID = 1:2, Zeit600 = c(601.782608695652, 602.625
), Zeit650 = c(504.705882352941, 546.666666666667), Zeit700 = c(321.26582278481, 
                                                                316.666666666667), Zeit750 = c(264.303797468354, 261.111111111111
                                                                ), Zeit800 = c(207.341772151899, 205.555555555556)), row.names = c(NA, 
                                                                                                                                   -2L), .Names = c("ID", "Zeit600", "Zeit650", "Zeit700", "Zeit750", 
                                                                                                                                                    "Zeit800"), class = "data.frame")

Here is the same data in a form that its easier to have a look at:

  ID  Zeit600  Zeit650  Zeit700  Zeit750  Zeit800
1  1 601.7826 504.7059 321.2658 264.3038 207.3418
2  2 602.6250 546.6667 316.6667 261.1111 205.5556

What i would like to do is to:

1. Calculate the difference between all the columns (except ID) to column named Zeit800 and named it (if it is possible T800_the number next to Zeit).

*My original data is reactive in shiny, therefore the number of columns Zeit... will differ, only column Zeit800 always stays.

The result will look like this:

  ID  Zeit600  Zeit650  Zeit700  Zeit750  Zeit800 T800_T600 T800_T650 T800_T700 T800_T750
1  1 601.7826 504.7059 321.2658 264.3038 207.3418  394.4408  297.3641  113.9241  56.96203
2  2 602.6250 546.6667 316.6667 261.1111 205.5556  397.0694  341.1111  111.1111  55.55556

2. Then i would like to perform small calculation, Calculate the difference between 800 and number which is next Zeit... in column names, and divide it by calculated values performed above at point 1 (T800...). So for example lets calculate this for column Zeit600 for ID=1:

800-600/T800_600 = 800-600/394.4408 = 0.507

The whole data frame would look like:

  ID  Zeit600  Zeit650  Zeit700  Zeit750  Zeit800 T800_T600 T800_T650 T800_T700 T800_T750 Abkuehlrate_T800_600 Abkuehlrate_T800_650
1  1 601.7826 504.7059 321.2658 264.3038 207.3418  394.4408  297.3641  113.9241  56.96203            0.5070469            0.5044321
2  2 602.6250 546.6667 316.6667 261.1111 205.5556  397.0694  341.1111  111.1111  55.55556            0.5036902            0.4397394
  Abkuehlrate_T800_700 Abkuehlrate_T800_750
1            0.8777778            0.8777778
2            0.9000000            0.9000000

Thanks for help!

LAP · Accepted Answer · 2017-01-30T12:24:17.707

2

Now here's the whole operation in form of a function:

myfun <- function(var, compvar, data) {
  diffcol <- as.data.frame(lapply(data[var], function(x) x-data[compvar]))
  names(diffcol) <- paste(compvar, var, sep = "_")
  mydata <- cbind(data, diffcol)

  abkuehlrate <- as.data.frame(mapply(function(x, y) 
    (as.numeric(gsub("T", "", compvar))-as.numeric(gsub("T", "", x)))/y, var, diffcol, SIMPLIFY = FALSE))
  names(abkuehlrate) <- paste("Abkuehlrate", compvar, gsub("T", "", var), sep = "_")
  mydata <- cbind(mydata, abkuehlrate)
  return(mydata)
}

You use it by feeding it the variables as strings of their name, and providing the data:

mydf <- myfun("T600", "T800", mydf)

This way, you can just use the function with a string of variable names, which you can extract from your data in any way you want. Example:

myvars <- names(mydf[,2:5])
newdf <- myfun(myvars, "T800", mydf)

Output:

> newdf
  ID     T600     T650     T700     T750     T800 T800_T600 T800_T650 T800_T700 T800_T750 Abkuehlrate_T800_T600
1  1 601.7826 504.7059 321.2658 264.3038 207.3418  394.4408  297.3641  113.9241  56.96203             0.5070469
2  2 602.6250 546.6667 316.6667 261.1111 205.5556  397.0694  341.1111  111.1111  55.55556             0.5036902

  Abkuehlrate_T800_T650 Abkuehlrate_T800_T700 Abkuehlrate_T800_T750
1             0.5044321             0.8777778             0.8777778
2             0.4397394             0.9000000             0.9000000

Edit: final small edit to get the exact variable names you wanted. If your variables have to be named Zeit600 etc, just substitute "Zeit" for the "T" in the gsub() operations.

edited Jan 30 '17 at 12:24

answered Jan 27 '17 at 08:50

LAP

6,605
2
15
28

It looks very good! Just one thing, my data (in this case `mydf`) as it is in shiny, it has a reactive number of columns, this means there might be **2,3,5 or any other number of columns** `Zeit...`. That means, i cannot use `...mydf[,2:4]...`, the solution must be general – Mal_a Jan 27 '17 at 09:02
Well it does not really matter, but we need the difference columns to calculate the abkuehlraten columns – Mal_a Jan 27 '17 at 09:09
Interesting, but when i run Your code, i get Error: `Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 2, 3 In addition: Warning messages: 1: In (as.numeric(gsub("T", "", compvar)) - as.numeric(gsub("T", "", : longer object length is not a multiple of shorter object length 2: In (as.numeric(gsub("T", "", compvar)) - as.numeric(gsub("T", "", : longer object length is not a multiple of shorter object length 3: In (as.numeric(gsub("T", "", compvar)) - as.numeric(gsub("T", "", : longer object length is not a multiple of shorter object..` – Mal_a Jan 27 '17 at 09:34
Thanks so much. However the difference column is correctly calculated, but unfortunatelly the abkuehlrate values are wrongly calculated i think (they do not match the correct values). . – Mal_a Jan 27 '17 at 09:59
I do have one more question to it, i have just noticed, if i have only one **ID** (ID = 1/only one data row, in shiny app dataset is reactive, depends on user input), then using Your function i get an error:`Error in names(abkuehlrate) <- paste("Abkuehlrate", compvar, gsub("T", : 'names' attribute [4] must be the same length as the vector [1]`...Any ideas how to solve it? – Mal_a Jan 30 '17 at 11:59
Hey, I got you :) The problem was that `mapply` used a one-dimensional vector as a column of a `data.frame`, instead of a row. I fixed it by setting the `SIMPLIFY`-option to false, so that `mapply` does not try to coerce its output into a vector. Should work now with the new version :) – LAP Jan 30 '17 at 12:23

Paulo MiraMor · Answer 2 · 2017-01-30T14:46:01.210

2

subData <- subset(data,select = - c(ID, Zeit800))
numbers <- as.numeric(gsub("\\D", "", names(subData)))
namesT <- paste0("T800_T",  numbers)
T800 <- subData-data$Zeit800
data[,namesT] <- T800
namesAbkuehlrate <- paste0("Abkuehlrate_T800_",  numbers)
data[,namesAbkuehlrate] <- mapply('/', (800-numbers), T800)

edited Jan 30 '17 at 14:46

answered Jan 27 '17 at 11:52

Paulo MiraMor

1,582
12
30

R Calculate the difference between values from one to all the other columns

2 Answers2

Linked