3

How can I calculate pairwise difference between values in one column?

The calculation should start with the first two values, and should be continued with the next two values as it is done in column "desired_result" here:

here

data <- data.frame(data = c(5, NA, NA, NA, 3, NA, NA, 4, NA, 3, NA, NA, NA, 6, 1, 4, NA, 2))
JonathanDavidArndt
  • 2,518
  • 13
  • 37
  • 49
flobrr
  • 197
  • 1
  • 2
  • 13
  • What have you tried? You might be able to do it with `zoo::na.locf`, `ifelse`, and a little more logical-glue. – r2evans Feb 19 '18 at 17:26
  • How about removing the `NA` values and then simply using the `diff` function? – tstudio Feb 19 '18 at 17:27
  • tstudio, thank you, the diff function solves my problem. Very easy if you know that function, but if you are an r rookie, you don not know these functions ;-) write it as answer and I give you the upvote – flobrr Feb 19 '18 at 17:39

3 Answers3

4

Here's a one-liner:

data$desired_result[which(!is.na(data$data))[c(FALSE, TRUE)]] <- 
  rev(diff(rev(na.omit(data$data))))[c(TRUE, FALSE)]

where which(!is.na(data$data)) finds non-NA entries of data$data and then adding c(FALSE, TRUE) chooses only every second one. Also, na.omit(data$data) discards NA values, rev reverses this vector, diff takes differences, rev reverses the vector back to the correct order, and, lastly, since we don't want all the differences, I again choose every second with c(TRUE, FALSE).

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
3

Same as Julius but even shorter and faster:

data$desired_result[which(!is.na(data$data))[c(FALSE, TRUE)]] <- 
  diff(na.omit(data$data))[c(TRUE, FALSE)] * -1

since diff() calculates x1 - x0, both rev() can be replace by diff() * -1

Speed comparison using microbenchmark:

Unit: microseconds
   expr    min     lq     mean     median      uq        max        neval   cld
   julius  38.096  43.757 51.44687 46.143      50.8655   170511.851 1e+05   b
   this    32.828  37.501 43.02233 39.548      43.4390   7405.489   1e+05   a
Aleh
  • 776
  • 7
  • 11
1

if you want to have a result exactly like you described here you can use:

> data <- data.frame(data = c(5, NA, NA, NA, 3, NA, NA, 4, NA, 3, NA,
> NA, NA, 6, 1, 4, NA, 2)) %>%   mutate(index = 1:n())
> 
> ex = data %>% filter(!is.na(data))
> 
> df2 = data.frame(index = rollapply(ex$index, width = 2, by = 2, last),
>                  desired_results = rollapply(ex$data, width = 2, by = 2, FUN = function (x) -1*diff(x)))
> 
> data2 = left_join(data, df2, by = "index") %>% select(-index)

   data desired_results
1     5              NA
2    NA              NA
3    NA              NA
4    NA              NA
5     3               2
6    NA              NA
7    NA              NA
8     4              NA
9    NA              NA
10    3               1
11   NA              NA
12   NA              NA
13   NA              NA
14    6              NA
15    1               5
16    4              NA
17   NA              NA
18    2               2

but if you just want the difference then you can use:

rollapply(na.omit(data$data), by = 2, width = 2, diff)

beware that you'll get negative results: -2 -1 -5 -2

  • Leonardo: whx do you use "=" instead on "<-"? isnt "<-" the proper way? – flobrr Feb 20 '18 at 11:20
  • 1
    They both work in the context of your question. The differences between them are listed here: https://stackoverflow.com/questions/1741820/assignment-operators-in-r-and – Leonardo Siqueira Feb 20 '18 at 16:59