0

I would really appreciate your help in using R for data wrangling. I have a data where I want to split one column (variable) into two whenever applicable as conditioned by other variables. For example, as per the sample below, the data represents reactions time measures (RT1 and RT2) of some words (item) that appear in different times of reading (block). I want to see if RT1 and RT2 values in block 3, 4, and 5 are correlated with RT1 and RT2 values of the same item at block 1. The target items that appeared in block 1 and re-appeared in subsequent blocks are coded as 'EI' in the column 'condition', whereas items coded as 'E' or 'I' appeared only once.

dput(d1)
structure(list(RECORDING_SESSION_LABEL = c(26, 26, 26, 26, 26, 
26, 26, 26), RT1 = c(5171, 3857, 3447, 314, 460, 731, 957, 1253
), RT2 = c(357, 328, 122, 39, 86, 132, 173, 215), item = c("foreign", 
"detailed", "large", "foreign", "foreign", "large", "large", 
"disputable"), block = c(1, 1, 1, 3, 4, 3, 4, 3), condition = c("EI", 
"E", "EI", "EI", "EI", "EI", "EI", "I")), row.names = c(NA, -8L
), class = c("tbl_df", "tbl", "data.frame"))

Where a sample of the data would look like this:

> d1
# A tibble: 8 x 6
  RECORDING_SESSION_LABEL   RT1   RT2 item       block condition
                    <dbl> <dbl> <dbl> <chr>      <dbl> <chr>    
1                      26  5171   357 foreign        1 EI       
2                      26  3857   328 detailed       1 E        
3                      26  3447   122 large          1 EI       
4                      26   314    39 foreign        3 EI       
5                      26   460    86 foreign        4 EI       
6                      26   731   132 large          3 EI       
7                      26   957   173 large          4 EI       
8                      26  1253   215 disputable     3 I   

In order to present in a format that R would understand, the target data frame I want to achieve would be similar to the one below (where the highlighted columns should be added). Rows in blanks at these columns represent items which do not appear repetitively (condition is not coded as 'EI') ; therefore, they are irrelevant and should be coded as 'NA'.

dput(d2)
structure(list(RECORDING_SESSION_LABEL = c(26, 26, 26, 26, 26, 
26, 26, 26), `RT 1` = c(5171, 3857, 3447, 314, 460, 731, 957, 
1253), RT2 = c(357, 328, 122, 39, 86, 132, 173, 215), item = c("foreign", 
"detailed", "large", "foreign", "foreign", "large", "large", 
"disputable"), block = c(1, 1, 1, 3, 4, 3, 4, 3), condition = c("EI", 
"E", "EI", "EI", "EI", "EI", "EI", "I"), `RT 1_at_block1` = c(NA, 
NA, NA, 5171, 5171, 3447, 3447, NA), RT2_at_block1 = c(NA, NA, 
NA, 357, 357, 122, 122, NA)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

And a sample of the data format targeted would look like this:

> d2
# A tibble: 8 x 8
  RECORDING_SESSI~ `RT 1`   RT2 item  block condition `RT 1_at_block1`
             <dbl>  <dbl> <dbl> <chr> <dbl> <chr>                <dbl>
1               26   5171   357 fore~     1 EI                      NA
2               26   3857   328 deta~     1 E                       NA
3               26   3447   122 large     1 EI                      NA
4               26    314    39 fore~     3 EI                    5171
5               26    460    86 fore~     4 EI                    5171
6               26    731   132 large     3 EI                    3447
7               26    957   173 large     4 EI                    3447
8               26   1253   215 disp~     3 I                       NA
# ... with 1 more variable: RT2_at_block1 <dbl>

> head(d2)
# A tibble: 6 x 8
  RECORDING_SESSION_LABEL `RT 1`   RT2 item     block condition `RT 1_at_block1` RT2_at_block1
                    <dbl>  <dbl> <dbl> <chr>    <dbl> <chr>                <dbl>         <dbl>
1                      26   5171   357 foreign      1 EI                      NA            NA
2                      26   3857   328 detailed     1 E                       NA            NA
3                      26   3447   122 large        1 EI                      NA            NA
4                      26    314    39 foreign      3 EI                    5171           357
5                      26    460    86 foreign      4 EI                    5171           357
6                      26    731   132 large        3 EI                    3447           122

Thanks in advance for any help.

azizi tamimi
  • 53
  • 1
  • 8
  • 2
    Please use `dput` to sshow the example instead of imagess. thanks – akrun Feb 09 '20 at 18:01
  • Please add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! Also your question is not easy to understand, an MRE (with emphasis on **minimal**) including expected out would be appreciated! – dario Feb 09 '20 at 18:03
  • thanks for your comment @dario, I hope it is set appropriately now. – azizi tamimi Feb 09 '20 at 19:14
  • You mention that: "Rows in blanks at these columns represent items which do not appear repetitively (condition is not coded as 'EI')". But in your `d2` rows 1 and 3 have `NA` in your new columns while condition **is** EI. – Valeri Voev Feb 09 '20 at 19:26
  • @azizi tamimi Thanks, looks good. It's much easier now to try and help you! – dario Feb 09 '20 at 19:38
  • Thanks for your comment @ValeriVoev. You are right and sorry for the confusion, but I guess I did not input their values because they are irrelevant to the analysis (I am checking the effect of RT1 and RT2 at block 1 on the effect of RT1 and RT2 at block 2:4. – azizi tamimi Feb 10 '20 at 09:09

1 Answers1

0

A possible solution using dplyr:

  d1 <- structure(list(RECORDING_SESSION_LABEL = c(26, 26, 26, 26, 26, 26, 26, 26),
                       RT1 = c(5171, 3857, 3447, 314, 460, 731, 957, 1253),
                       RT2 = c(357, 328, 122, 39, 86, 132, 173, 215),
                       item = c("foreign", "detailed", "large", "foreign", "foreign", "large", "large", "disputable"),
                       block = c(1, 1, 1, 3, 4, 3, 4, 3), condition = c("EI", "E", "EI", "EI", "EI", "EI", "EI", "I")),
                  row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))



  library(dplyr)

  d2 <- d1 %>% 
    left_join(d1 %>% filter(block == 1) %>% select(RECORDING_SESSION_LABEL, item, RT1_at_block1 = RT1)) %>% 
    left_join(d1 %>% filter(block == 1) %>% select(RECORDING_SESSION_LABEL, item, RT2_at_block1 = RT2))

After that, d2 looks like this:

    RECORDING_SESSION_LABEL   RT1   RT2 item     block condition RT1_at_block1 RT2_at_block1
                      <dbl> <dbl> <dbl> <chr>    <dbl> <chr>             <dbl>         <dbl>
  1                      26  5171   357 foreign      1 EI                 5171           357
  2                      26  3857   328 detailed     1 E                  3857           328
  3                      26  3447   122 large        1 EI                 3447           122
  4                      26   314    39 foreign      3 EI                 5171           357
  5                      26   460    86 foreign      4 EI                 5171           357
  6                      26   731   132 large        3 EI                 3447           122

Edit: Adding a mutate if you want to set the values for block 1 to NA:

d2 <- d1 %>% 
  left_join(d1 %>% filter(block == 1) %>% select(RECORDING_SESSION_LABEL, item, RT1_at_block1 = RT1)) %>% 
  left_join(d1 %>% filter(block == 1) %>% select(RECORDING_SESSION_LABEL, item, RT2_at_block1 = RT2)) %>% 
  mutate(RT1_at_block1 = ifelse(block == 1, NA, RT1_at_block1),
         RT2_at_block1 = ifelse(block == 1, NA, RT2_at_block1)) 
dario
  • 6,415
  • 2
  • 12
  • 26