Replacing values in a data frame based on column names, ID numbers, and key values from other data frames

Question

I'm trying to aggregate data from multiple sources.
In the example below, I'd like to create a new data frame called RESULT. I need to replace the values from the cells in dt2 with the values from SCORE in dt1 based the ID and TASK variables.

dt1 <- read.table(text = "
ID   TASK        SCORE  
1    task1        0 
1    task2        0 
2    task1        1 
2    task2        0 
3    task1        1 
3    task2        0 
4    task1        0 
4    task2        1
5    task1        1 
5    task2        0 
6    task1        0 
6    task2        1 
", header = TRUE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "
ID   THING1    THING2   THING3   THING4
1        NA     task2    NA      task1
2        NA     NA       NA      NA
3        task1  NA       task2   NA
4        NA     task1    NA      task2
5        task2  NA       task1   NA
6        NA     task1    task2   NA
", header = TRUE, stringsAsFactors = FALSE)

RESULT <- read.table(text = "
ID   THING1  THING2  THING3  THING4
1        NA     0       NA      0
2        NA     NA      NA      NA
3        1      NA      0       NA
4        NA     0       NA      1
5        0      NA      1       NA
6        NA     0       1       NA 
", header = TRUE, stringsAsFactors = FALSE)

For example, in dt1, we can see that ID = 1 received a score of 0 on task1. From dt2, we can see that ID = 1 did THING4 for their first TASK. As shown in the RESULT dataframe, ID = 1 now has a value of 0 for THING2.

I am not quite sure where to start. I've read through several threads but none of them (that I can find) solve this exact problem (for example here and here) and I'm not quite sure how to modify these examples to get the result I need because this kind of programming is very new to me. So I'd really appreciate any suggestions!

akrun · Accepted Answer · 2020-03-08T20:22:22.850

We can convert to 'long' format, do a join and then reshape back to 'wide' after replacing the values

library(dplyr)
library(tidyr)
dt2 %>% 
   pivot_longer(cols = -ID, values_to = 'TASK') %>%
   left_join(dt1) %>%
   mutate(TASK = coalesce(as.character(SCORE), TASK)) %>% 
   select(-SCORE) %>% 
   pivot_wider(names_from = name, values_from = TASK)
# A tibble: 6 x 5
#     ID THING1 THING2 THING3 THING4
#  <int> <chr>  <chr>  <chr>  <chr> 
#1     1 <NA>   0      <NA>   0     
#2     2 <NA>   <NA>   <NA>   <NA>  
#3     3 1      <NA>   0      <NA>  
#4     4 <NA>   0      <NA>   1     
#5     5 0      <NA>   1      <NA>  
#6     6 <NA>   0      1      <NA>

Or another option is match

library(stringr)
dt2[-1] <- dt1$SCORE[match(str_c(dt2$ID, as.matrix(dt2[-1])), 
                           str_c(dt1$ID, dt1$TASK))]

The `match` option is exactly what I was looking for. – socialresearcher Mar 08 '20 at 20:32 — socialresearcher, Mar 08 '20 at 20:32

Replacing values in a data frame based on column names, ID numbers, and key values from other data frames

1 Answers1