R timestep subtraction

Question

I am trying to build code for analysis of long term data that will be continually added to for the ids currently in the data, and may have additional ids added. Because of this I want to make sure that my code doesn't have to be dramatically changed each time we add more data to it. I've used dplyr to spread and subtract by columns but that isn't really feasible as the data get longer because you have to modify the code so much each time.

Here is a subset of the data:

    data<-structure(list(pinid = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label 
    = c("CP_South_1_1", "CP_South_1_2"), class = "factor"), reading_date = 
    structure(c(16308, 16308, 16531, 16531, 16728, 16728), class = "Date"), 
    timestep = c("t0", "t0", "t1", "t1", "t2", "t2"), measurement = c(189, 
    186, 187, 185, 184, 181)), .Names = c("pinid", "reading_date", 
    "timestep", "measurement"), row.names = c(NA, -6L), class = 
    "data.frame")

I am trying to sequentially subtract the values by pinid so that I would get t1-t0, t2-t1 etc... If I can get it to work with sequential dates rather than timesteps that would be better as the timestep is an additional thing to enter in the data entry process, or if it works with the timestep as numbers I can probably make that work as well.

Currently I've had some success with some code from this question:

    pin_dif <- function(x) setNames(
      data.frame(pinid = x$pinid, as.list(- combn(x$measurement, 2, diff))),
      c("pinid", combn(x$timestep, 2, paste, collapse = "_"))
    )
    by(data, data$pinid, pin_dif)

However, the results are giving me absolute values, and spit out in a block that repeats itself to look like this:

    data$pinid: CP_South_1_1
             pinid t0_t1 t0_t2 t1_t2
    1 CP_South_1_1     2     5     3
    2 CP_South_1_1     2     5     3
    3 CP_South_1_1     2     5     3
    ------------------------------------------------------------------------
    data$pinid: CP_South_1_2
             pinid t0_t1 t0_t2 t1_t2
    1 CP_South_1_2     1     5     4
    2 CP_South_1_2     1     5     4
    3 CP_South_1_2     1     5     4

Ideally it would spit out the results in a dataframe that looked like:

             pinid t0_t1 t0_t2 t1_t2
    1 CP_South_1_1     -2    -5     -3
    2 CP_South_1_2     -1     5     -4

To cut down on processing time I also would not like to have every combination of values subtracted. There is only one extra in this smaller dataset (t0-t2), but as you have more time steps there are more values that I don't need.

Thanks for any help.

score 1 · Answer 1 · answered Nov 20 '17 at 18:31

I would suggest you to work with a data.table. It will make it much easier to manipulate.

You can adjust the following script at your preference( take less combinations etc) and wrap it in a function.

data <- data %>% as.data.table()
data <- data %>% dcast.data.table(formula = pinid~timestep, value.var = "measurement")
data2 <- data %>% copy()

combs <- expand.grid(names(data[,2:ncol(data)]),names(data[,2:ncol(data)])) %>% as.data.table()
combs <- combs[Var2 %>% as.character()>Var1 %>% as.character()][,var3:=paste(Var1, Var2, sep = "_")]

for (i in combs$var3){ # i <- combs$var3[1];i
data2[, (i) := get(word(string = i,start = 2,sep = "_")) - get(word(string = i,start = 1,sep = "_"))]
}
names_vars <- names(data[, 2:ncol(data)])
data2 <- data2[, !names_vars, with = F]

 data2
          pinid t0_t1 t0_t2 t1_t2
1: CP_South_1_1    -2    -5    -3
2: CP_South_1_2    -1    -5    -4

score 0 · Answer 2 · answered Nov 20 '17 at 20:50

Give this method a try

library(tidyverse)
data %>%
  group_by(pinid) %>%
  arrange(pinid, timestep) %>%
  nest(timestep, measurement) %>%
  mutate(data = map(data, ~data.frame(key = paste(combn(.x$timestep, 2)[1,], combn(.x$timestep, 2)[2,], sep="_"),
                          value = combn(.x$measurement, 2)[2,] - combn(.x$measurement, 2)[1,]))) %>%
  unnest() %>%
  spread(key, value)

Output

# A tibble: 2 x 4
         pinid t0_t1 t0_t2 t1_t2
*       <fctr> <dbl> <dbl> <dbl>
1 CP_South_1_1    -2    -5    -3
2 CP_South_1_2    -1    -5    -4

R timestep subtraction

2 Answers2