Calculating differences with specific values in data frame in R

Question

I have the following dataframe in RStudio: screenshot from my dataframe

Timepoint a and b are pre- and post values and I want to calculate the difference between the two i.e. b-a

I want to do this for each subject and each session seperately meaning for subject 1 I want to calculate the difference for T1, T2 and T3.

I greatly appreciate any help!

I thought about filtering and subsetting with Tidyverse but this seems very complicated and I gues there must be an easier way.

Please do not post data or code as images. Take a look at [How to make a great reproducible example] (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and [How to ask] (https://stackoverflow.com/help/how-to-ask) for hints on how to improve your question. — Martin Gal, Apr 13 '23 at 10:42

score 2 · Accepted Answer · answered Apr 13 '23 at 10:44

Here is a dplyr way:

library(dplyr)

df <- data.frame(subject = c(rep(1, 6), rep(2, 3)),
             var = "SMSPAg",
             session = c("T1", "T1", "T2", "T2", "T3", "T3","T1", "T1","T2"),
             timepoint = c("a", "b", "a", "b","a", "b","a", "b", "a"),
             value = c(50, 48, 52, 65, 51, 61, 53, 50, 54)
             )    

df %>% 
  summarise(diff = last(value) - first(value), .by = c(subject, session))

  subject session diff
1       1      T1   -2
2       1      T2   13
3       1      T3   10
4       2      T1   -3
5       2      T2    0

score 2 · Answer 2 · answered Apr 13 '23 at 10:46

You can either reshape the data wider and then compute the difference between two variables as normal, or you can keep the data in long format and extract a specific element from each vector (broken down by group) as suggested by TarJae.

Reshaping wider (as shown below) has the advantage that it does not require a and b to be in the correct order.

library(tidyverse)

df |> 
  pivot_wider(
    names_from = timepoint,
    values_from = value
  ) |> 
  mutate(difference = b - a)
#> # A tibble: 5 × 6
#>   subject var    session     a     b difference
#>     <int> <chr>  <chr>   <int> <int>      <int>
#> 1       1 SMSPAg T1         50    48         -2
#> 2       1 SMSPAg T2         52    65         13
#> 3       1 SMSPAg T3         51    61         10
#> 4       2 SMSPAg T1         53    50         -3
#> 5       2 SMSPAg T2         54    NA         NA

^{Created on 2023-04-13 with reprex v2.0.2}

where

df <- tribble(
  ~subject,     ~var, ~session, ~timepoint, ~value,
        1L, "SMSPAg",     "T1",        "a",    50L,
        1L, "SMSPAg",     "T1",        "b",    48L,
        1L, "SMSPAg",     "T2",        "a",    52L,
        1L, "SMSPAg",     "T2",        "b",    65L,
        1L, "SMSPAg",     "T3",        "a",    51L,
        1L, "SMSPAg",     "T3",        "b",    61L,
        2L, "SMSPAg",     "T1",        "a",    53L,
        2L, "SMSPAg",     "T1",        "b",    50L,
        2L, "SMSPAg",     "T2",        "a",    54L
  )

Calculating differences with specific values in data frame in R

2 Answers2