Create new columns by substracting column pairs from each other in R

Question

In psychology and related disciplines, we often have a ton of variable name pairs, which are appended e.g. "_T1" or "_T3" to signify time points.

I would like to substract columns with the appendix "_T1" from the ones with appendix "_T3" for each row, creating a new column (i.e. difference score) for every row (i.e. participant), based on each variable pair.

Would prefer a dplyr solution, but anything goes.

Apologies for egregiously breaking any codes of conduct on this first post of mine.

Try `cbind(df1, df1[grep("_T1", names(df1))] - df1[grep("_T3", names(df1))])` assuming they are in the order — akrun, Nov 24 '17 at 18:01
Please see my answer. Next time, please include a reproducible example of your dataset (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example-aka-mcve-minimal-complete-and-ver) and the desired output when asking a question. — www, Nov 24 '17 at 18:13

www · Accepted Answer · 2017-11-24T18:18:39.363

A solution using dplyr and tidyr.

First, let's create an example data frame. This data frame contains T1 and T3 data from two participants A and B.

# Set the seed for reproducibility
set.seed(123)

# Create an example data frame
dt <- data.frame(ID = 1:10,
                 A_T1 = runif(10),
                 A_T3 = runif(10),
                 B_T1 = runif(10),
                 B_T3 = runif(10))
dt
#     ID      A_T1       A_T3      B_T1       B_T3
#  1   1 0.2875775 0.95683335 0.8895393 0.96302423
#  2   2 0.7883051 0.45333416 0.6928034 0.90229905
#  3   3 0.4089769 0.67757064 0.6405068 0.69070528
#  4   4 0.8830174 0.57263340 0.9942698 0.79546742
#  5   5 0.9404673 0.10292468 0.6557058 0.02461368
#  6   6 0.0455565 0.89982497 0.7085305 0.47779597
#  7   7 0.5281055 0.24608773 0.5440660 0.75845954
#  8   8 0.8924190 0.04205953 0.5941420 0.21640794
#  9   9 0.5514350 0.32792072 0.2891597 0.31818101
# 10  10 0.4566147 0.95450365 0.1471136 0.23162579

We can use dplyr and tidyr to convert the data frame from wide format to long format and perform the operation. Diff is the difference between T1 and T3.

# Load packages
library(dplyr)
library(tidyr)

dt2 <- dt %>%
  gather(Column, Value, -ID) %>%
  separate(Column, into = c("Participant", "Group")) %>%
  spread(Group, Value) %>%
  mutate(Diff = T1 - T3)

dt2
#    ID Participant        T1         T3        Diff
# 1   1           A 0.2875775 0.95683335 -0.66925583
# 2   1           B 0.8895393 0.96302423 -0.07348492
# 3   2           A 0.7883051 0.45333416  0.33497098
# 4   2           B 0.6928034 0.90229905 -0.20949564
# 5   3           A 0.4089769 0.67757064 -0.26859371
# 6   3           B 0.6405068 0.69070528 -0.05019846
# 7   4           A 0.8830174 0.57263340  0.31038400
# 8   4           B 0.9942698 0.79546742  0.19880236
# 9   5           A 0.9404673 0.10292468  0.83754260
# 10  5           B 0.6557058 0.02461368  0.63109211
# 11  6           A 0.0455565 0.89982497 -0.85426847
# 12  6           B 0.7085305 0.47779597  0.23073450
# 13  7           A 0.5281055 0.24608773  0.28201775
# 14  7           B 0.5440660 0.75845954 -0.21439351
# 15  8           A 0.8924190 0.04205953  0.85035951
# 16  8           B 0.5941420 0.21640794  0.37773408
# 17  9           A 0.5514350 0.32792072  0.22351430
# 18  9           B 0.2891597 0.31818101 -0.02902127
# 19 10           A 0.4566147 0.95450365 -0.49788891
# 20 10           B 0.1471136 0.23162579 -0.08451214

If the original format is desirable, we can further spread the data frame to the original format.

dt3 <- dt2 %>%
  select(-starts_with("T")) %>%
  spread(Participant, Diff)

dt3
#    ID          A           B
# 1   1 -0.6692558 -0.07348492
# 2   2  0.3349710 -0.20949564
# 3   3 -0.2685937 -0.05019846
# 4   4  0.3103840  0.19880236
# 5   5  0.8375426  0.63109211
# 6   6 -0.8542685  0.23073450
# 7   7  0.2820178 -0.21439351
# 8   8  0.8503595  0.37773408
# 9   9  0.2235143 -0.02902127
# 10 10 -0.4978889 -0.08451214

score 2 · Answer 2 · answered Nov 24 '17 at 18:13

Assuming all data are in dataframe d, the following will store the variables in columns ending in _diff:

library(stringr)
t1_vars <- grep("_T1", colnames(d), value=TRUE)
t3_vars <- grep("_T3", colnames(d), value=TRUE)
d[, paste0(str_sub(t1_vars, end=-4), "_diff")] <- d[, t3_vars] - d[, t1_vars]

Create new columns by substracting column pairs from each other in R

2 Answers2