0

I would like to compute the sums of multiple grouped columns in my dataset and store those sums as new columns, but I do not know how to encode this in R language.

Setting: 100 participants played a video game and rated various aspects using standardized questionnaires. The data is in wide format, so I have 100 rows (one row per participants) and a certain number of columns pertaining to each aspect of the video game that was rated.

To show what my dataset looks like, a simplified version of it is given in this image.

Details: Because the questionnaire contains three distinct subscales, I need to sum all participants' individual ratings for each subscale (highlighted Green, Yellow, Blue in the image). The sums of each subscale must be added as a new column (resulting in sums for three subscales). This is illustrated by the Red columns in the image.

I do not know how to efficiently encode this in R language. Could anyone provide some directions?

Dion Groothof
  • 1,406
  • 5
  • 15
stevemn
  • 1
  • 1
  • 2
  • 4
    Can you please provide a reproducible example? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Jon Spring Jan 07 '22 at 00:36

1 Answers1

0

Please provide a reproducible example the next time you ask a question (as @Jon Spring also commented). For now, I will provide one.

Assuming that the column names in the image you have provided are alike the column names in your own data and also, equally important, that e.g. the green columns to be summed in subscale_1 always correspond to the 1st, 3rd, and 6th column of Buttons_ and so forth, the following lines of code should realize what you are looking for (although I am sure that there are more efficient ways to achieve this).

First, let's make a reprex with which we can test our code.

set.seed(1)

# no. of participants
n <- 100L

# columns in the desired format
n_cols <- 21L
df <- data.frame(matrix(rep(NA, times = n * n_cols), ncol = n_cols, byrow = TRUE))
names(df) <- c('id', paste0('LoadScreen_Buttons_', 1:7),
               paste0('LoadScreen_Subscale_', 1:3), paste0('GameHUD_Buttons_', 1:7),
               paste0('GameHUD_Buttons_Subscale_', 1:3))

# subject id
df$id <- 1L:100L

# allocate random values to the buttons columns
buttons_cols <- grep('ns_[[:digit:]]', names(df), value = TRUE)
df[, buttons_cols] <- apply(df[, buttons_cols], 2, \(x) sample(1L:7L, n, replace = TRUE))

The structure of df (for the column names starting with ‘LoadScreen’ for the first two participants) is as follows.

> head(df[, c('id', grep('^L', names(df), value = TRUE))], 2)
  id LoadScreen_Buttons_1 LoadScreen_Buttons_2 LoadScreen_Buttons_3 LoadScreen_Buttons_4 LoadScreen_Buttons_5
1  1                    1                    1                    2                    7                    6
2  2                    4                    7                    2                    4                    6
  LoadScreen_Buttons_6 LoadScreen_Buttons_7 LoadScreen_Subscale_1 LoadScreen_Subscale_2 LoadScreen_Subscale_3
1                    4                    5                    NA                    NA                    NA
2                    5                    6                    NA                    NA                    NA

Now we need to extract the names of the subscale columns in df. We also need to extract the 1st, 3rd, and 6th bottons columns (corresponding to the green columns in your image), take the rowSums of those, and store the results in the subscale column pertaining to the specific buttons columns. The same applies to the yellow and blue coloured columns in the image that you have provided. Finally, because we have i groups of buttons columns (i = 2 in this reprex, namely we have LoadScreen and GameHUD), we iterate this proces i times using a three for loops, one loop for each colour.

# compute buttons subscales -----------------

# extract column names from df that include the buttoms columns
buttons <- grep('ns_[[:digit:]]', names(df), value = TRUE)

# extract column names from df that include the subscales
subscales <- grep('e_[[:digit:]]$', names(df), value = TRUE)

# green ------------
green <- grep('1$|3$|6$', buttons, value = TRUE)
mat <- matrix(green, ncol = length(green)/3L)
for(i in 1:dim(mat)[2]) {
  df[, subscales[endsWith(subscales, '1')][i]] <- rowSums(df[, mat[, i]])
}
# yellow ------------
yellow <- grep('2$|7$', buttons, value = TRUE)
mat <- matrix(yellow, ncol = length(yellow)/2L)
for(i in 1:dim(mat)[2]) {
  df[, subscales[endsWith(subscales, '2')][i]] <- rowSums(df[, mat[, i]])
}
# blue ------------
blue <- grep('4$|5$', buttons, value = TRUE)
mat <- matrix(blue, ncol = length(blue)/2L)
for(i in 1:dim(mat)[2]) {
  df[, subscales[endsWith(subscales, '3')][i]] <- rowSums(df[, mat[, i]])
}

Result.

> head(df[, c('id', grep('^L', names(df), value = TRUE))], 2)
  id LoadScreen_Buttons_1 LoadScreen_Buttons_2 LoadScreen_Buttons_3 LoadScreen_Buttons_4 LoadScreen_Buttons_5
1  1                    1                    1                    2                    7                    6
2  2                    4                    7                    2                    4                    6
  LoadScreen_Buttons_6 LoadScreen_Buttons_7 LoadScreen_Subscale_1 LoadScreen_Subscale_2 LoadScreen_Subscale_3
1                    4                    5                     7                     6                    13
2                    5                    6                    11                    13                    10
Dion Groothof
  • 1,406
  • 5
  • 15