2

I'm trying to aggregate data from multiple sources.
In the example below, I'd like to create a new data frame called RESULT. I need to replace the values from the cells in dt2 with the values from SCORE in dt1 based the ID and TASK variables.

dt1 <- read.table(text = "
ID   TASK        SCORE  
1    task1        0 
1    task2        0 
2    task1        1 
2    task2        0 
3    task1        1 
3    task2        0 
4    task1        0 
4    task2        1
5    task1        1 
5    task2        0 
6    task1        0 
6    task2        1 
", header = TRUE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "
ID   THING1    THING2   THING3   THING4
1        NA     task2    NA      task1
2        NA     NA       NA      NA
3        task1  NA       task2   NA
4        NA     task1    NA      task2
5        task2  NA       task1   NA
6        NA     task1    task2   NA
", header = TRUE, stringsAsFactors = FALSE)

RESULT <- read.table(text = "
ID   THING1  THING2  THING3  THING4
1        NA     0       NA      0
2        NA     NA      NA      NA
3        1      NA      0       NA
4        NA     0       NA      1
5        0      NA      1       NA
6        NA     0       1       NA 
", header = TRUE, stringsAsFactors = FALSE) 

For example, in dt1, we can see that ID = 1 received a score of 0 on task1. From dt2, we can see that ID = 1 did THING4 for their first TASK. As shown in the RESULT dataframe, ID = 1 now has a value of 0 for THING2.

I am not quite sure where to start. I've read through several threads but none of them (that I can find) solve this exact problem (for example here and here) and I'm not quite sure how to modify these examples to get the result I need because this kind of programming is very new to me. So I'd really appreciate any suggestions!

1 Answers1

2

We can convert to 'long' format, do a join and then reshape back to 'wide' after replacing the values

library(dplyr)
library(tidyr)
dt2 %>% 
   pivot_longer(cols = -ID, values_to = 'TASK') %>%
   left_join(dt1) %>%
   mutate(TASK = coalesce(as.character(SCORE), TASK)) %>% 
   select(-SCORE) %>% 
   pivot_wider(names_from = name, values_from = TASK)
# A tibble: 6 x 5
#     ID THING1 THING2 THING3 THING4
#  <int> <chr>  <chr>  <chr>  <chr> 
#1     1 <NA>   0      <NA>   0     
#2     2 <NA>   <NA>   <NA>   <NA>  
#3     3 1      <NA>   0      <NA>  
#4     4 <NA>   0      <NA>   1     
#5     5 0      <NA>   1      <NA>  
#6     6 <NA>   0      1      <NA>  

Or another option is match

library(stringr)
dt2[-1] <- dt1$SCORE[match(str_c(dt2$ID, as.matrix(dt2[-1])), 
                           str_c(dt1$ID, dt1$TASK))]
akrun
  • 874,273
  • 37
  • 540
  • 662