I'm trying to aggregate data from multiple sources.
In the example below, I'd like to create a new data frame called RESULT
. I need to replace the values from the cells in dt2
with the values from SCORE
in dt1
based the ID
and TASK
variables.
dt1 <- read.table(text = "
ID TASK SCORE
1 task1 0
1 task2 0
2 task1 1
2 task2 0
3 task1 1
3 task2 0
4 task1 0
4 task2 1
5 task1 1
5 task2 0
6 task1 0
6 task2 1
", header = TRUE, stringsAsFactors = FALSE)
dt2 <- read.table(text = "
ID THING1 THING2 THING3 THING4
1 NA task2 NA task1
2 NA NA NA NA
3 task1 NA task2 NA
4 NA task1 NA task2
5 task2 NA task1 NA
6 NA task1 task2 NA
", header = TRUE, stringsAsFactors = FALSE)
RESULT <- read.table(text = "
ID THING1 THING2 THING3 THING4
1 NA 0 NA 0
2 NA NA NA NA
3 1 NA 0 NA
4 NA 0 NA 1
5 0 NA 1 NA
6 NA 0 1 NA
", header = TRUE, stringsAsFactors = FALSE)
For example, in dt1
, we can see that ID = 1
received a score of 0
on task1
.
From dt2
, we can see that ID = 1
did THING4
for their first TASK
.
As shown in the RESULT
dataframe, ID = 1
now has a value of 0
for THING2
.
I am not quite sure where to start. I've read through several threads but none of them (that I can find) solve this exact problem (for example here and here) and I'm not quite sure how to modify these examples to get the result I need because this kind of programming is very new to me. So I'd really appreciate any suggestions!