Complex dataframe values selection based on both rows and columns

Question

I need to select some values on each row of the dataset below and compute a sum.

This is a part of my dataset.

> prova
   key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP
18          3483           364          3509             b             n             m
19          2367           818          3924             b             n             m
20          3775          1591           802             b             m             n
21           929          3059           744             n             b             n
22          3732           530          1769             b             n             m
23          3503          2011          2932             b             n             b
24          3684          1424          1688             b             n             m

Rows are trials of the experiment and columns are the keys pressed, in temporal sequence (keypressRESP) and the amount of time of the key until the next one (key_duration).

So for example in the first trial (first row) I pressed "b" and after 3483 ms I pressed "n" and so on.
This is my dataframe

structure(list(key_duration1 = c(3483L, 2367L, 3775L, 929L, 3732L, 
3503L, 3684L), key_duration2 = c(364L, 818L, 1591L, 3059L, 530L, 
2011L, 1424L), key_duration3 = c(3509, 3924, 802, 744, 1769, 
2932, 1688), KeyPress1RESP = structure(c(2L, 2L, 2L, 4L, 2L, 
2L, 2L), .Label = c("", "b", "m", "n"), class = "factor"), KeyPress2RESP = structure(c(4L, 
4L, 3L, 2L, 4L, 4L, 4L), .Label = c("", "b", "m", "n"), class = "factor"), 
    KeyPress3RESP = structure(c(3L, 3L, 4L, 4L, 3L, 2L, 3L), .Label = c("", 
    "b", "m", "n"), class = "factor")), row.names = 18:24, class = "data.frame")

I need a method for select in each row (trial) all "b" values, compute the sum(key_duration) and print the values on a new column, the same for "m".

How can i do?

I think that i need a function similar to 'apply()' but without compute every values on the row but only selected values.

apply(prova[,1:3],1,sum)

Thanks

Please take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to modify your question, with a smaller sample taken from your data (check?dput()). Posting images of your data or no data makes it difficult to impossible for us to help you! — massisenergy, Dec 22 '18 at 19:46
@massisenergy thanks for your tips, sorry i'm not skilled in this field. I've tried to modify the question and i've added the dput() output. — Filippo Gambarota, Dec 22 '18 at 19:52
@FilippoGambarota Could you also show the expected output? You mention `"b"` and `"m"` - what about `"n"`? — markus, Dec 22 '18 at 19:57
@markus I need an apply() like function but i have to sum not all value of the row but only "b" values and add these values on a new column and the same for "m" values. I don't need "n" values. — Filippo Gambarota, Dec 22 '18 at 20:02
Are your columns fixed? I mean, do you have columns other than these ones in the sample data (e.g., `key_duration4` and `keypress4RESP`)? — Taher A. Ghaleb, Dec 22 '18 at 20:37
@TeeKea Yes! i've a lot of columns like this... furthermore the final number of columns is unknown because depends on the number of keys pressed from the subjects, so i'm looking for a generalizable method to apply. — Filippo Gambarota, Dec 22 '18 at 21:11
Did the answers below solve the problem? Could you accept and/or upvote any that helped? — Joe, Jul 30 '19 at 14:30

score 0 · Answer 1 · answered Dec 22 '18 at 20:29

Here is a way using data.table.

library(data.table)
setDT(prova)

# melt
prova_long <-
  melt(
    prova[, idx := 1:.N],
    id.vars = "idx",
    measure.vars = patterns("^key_duration", "^KeyPress"),
    variable.name = "key",
    value.name = c("duration", "RESP")
  )

# aggregate
prova_aggr <- prova_long[RESP != "n", .(duration_sum = sum(duration)), by = .(idx, RESP)]

# spread and join
prova[dcast(prova_aggr, idx ~ paste0("sum_", RESP)), c("sum_b", "sum_m") := .(sum_b, sum_m), on = "idx"]
prova

Result

#   key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP idx sum_b sum_m
#1:          3483           364          3509             b             n             m   1  3483  3509
#2:          2367           818          3924             b             n             m   2  2367  3924
#3:          3775          1591           802             b             m             n   3  3775  1591
#4:           929          3059           744             n             b             n   4  3059    NA
#5:          3732           530          1769             b             n             m   5  3732  1769
#6:          3503          2011          2932             b             n             b   6  6435    NA
#7:          3684          1424          1688             b             n             m   7  3684  1688

The idea is to reshape your data to long format, aggregate by "RESP" per row. Spread the result and join back to your initial data.

tmfmnk · Answer 2 · 2018-12-22T20:36:21.323

With tidyverse you can do:

bind_cols(df %>%
 select_at(vars(starts_with("KeyPress"))) %>%
 rowid_to_column() %>%
 gather(var, val, -rowid), df %>%
 select_at(vars(starts_with("key_"))) %>%
 rowid_to_column() %>%
 gather(var, val, -rowid)) %>%
 group_by(rowid) %>%
 summarise(b_values = sum(val1[val == "b"]),
           m_values = sum(val1[val == "m"])) %>%
 left_join(df %>%
            rowid_to_column(), by = c("rowid" = "rowid")) %>%
 ungroup() %>%
 select(-rowid)

  b_values m_values key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP
     <dbl>    <dbl>         <int>         <int>         <dbl> <fct>         <fct>         <fct>        
1    3483.    3509.          3483           364         3509. b             n             m            
2    2367.    3924.          2367           818         3924. b             n             m            
3    3775.    1591.          3775          1591          802. b             m             n            
4    3059.       0.           929          3059          744. n             b             n            
5    3732.    1769.          3732           530         1769. b             n             m            
6    6435.       0.          3503          2011         2932. b             n             b            
7    3684.    1688.          3684          1424         1688. b             n             m

First, it splits the df into two: one with variables starting with "KeyPress" and one with variables starting with "key_". Second, it transforms the two dfs from wide to long format and combines them by columns. Third, it creates a summary for "b" and "m" values according row ID. Finally, it merges the results with the original df.

Joe · Answer 3 · 2018-12-23T08:54:03.090

You can make a logical matrix from the KeyPress columns, multiply it by the key_duration subset and then take their rowSums.

prova$b_values <- rowSums((prova[, 4:6] == "b") * prova[, 1:3])
prova$n_values <- rowSums((prova[, 4:6] == "n") * prova[, 1:3])


   key_duration1 key_duration2 key_duration3 KeyPress1RESP KeyPress2RESP KeyPress3RESP b_values n_values
18          3483           364          3509             b             n             m     3483     364
19          2367           818          3924             b             n             m     2367     818
20          3775          1591           802             b             m             n     3775     802
21           929          3059           744             n             b             n     3059    1673
22          3732           530          1769             b             n             m     3732     530
23          3503          2011          2932             b             n             b     6435    2011
24          3684          1424          1688             b             n             m     3684    1424

It works because the logical values are coerced to numeric 1s or 0s, and only the values for individual keys are retained.

Extra: to generalise, you could instead use a function and tidyverse/purrr to map it:

get_sums <- function(key) rowSums((prova[, 4:6] == key) * prova[, 1:3])
keylist <- list(b_values = "b", n_values = "n", m_values = "m")

library(tidyverse)
bind_cols(prova, map_dfr(keylist, get_sums))

Complex dataframe values selection based on both rows and columns

3 Answers3