Populate New Variable In Data Frame Using Relationship Between Two Variables In Another Data Frame

Question

I have two data frames with different #s of observations (one is long with 2220 obs, the other is wide with 37 obs). The data frames share the variable "SID", although in the long data frame there is 60 rows for each SID value and in the wide one there is only one. The wide data frame has an additional variable "Experimenter", each SID has a corresponding Experimenter number. I would like to make an "Experimenter" column in the long data frame, there are 60 instances for each SID though and I would like the corresponding Experimenter value to be added and repeated each time the SID value occurs (so 60 times).

Nested if-else commands for every subject seems very tedious so I'm hoping there's an alternative

I've added the dput output from each data frame, unfortunately, I'm not sure how to embed them. Right now in the long data frame "SID" is named "Subject" but they are the same variable.

Wide:

structure(list(SID = 7301:7302, Experimenter = c(2L, 1L)), .Names = c("SID", 
"Experimenter"), class = "data.frame", row.names = c(NA, -2L))

Long:

structure(list(Subject = c(7301L, 7301L, 7301L), Session = c(1L, 
1L, 1L), Stimtype = structure(c(1L, 1L, 1L), .Label = "Control", class = 
"factor"), 
Valence = structure(c(1L, 1L, 1L), .Label = "Neutral", class = "factor"), 
Block = c(1L, 1L, 1L), Image = c(12L, 17L, 22L), Group = structure(c(1L, 
3L, 2L), .Label = c("Neutral_1660", "Neutral_5300", "Neutral_7233"
), class = "factor"), Response = c(1L, 1L, 1L), Stimulus = c(1660L, 
7233L, 5300L)), .Names = c("Subject", "Session", "Stimtype", 
"Valence", "Block", "Image", "Group", "Response", "Stimulus"), class = 
"data.frame", row.names = c(NA, 
-3L))

If we're looking at those images, all I want to do is insert the "Experimenter" variable in the long data frame that has the value "2" whenever the "Subject" is "7301" (as is in the wide data) and so on for all the subjects.

Thank you in advance.

Please include sample data and expected output in a reproducible, copy&paste-able format. Screenshots are never a good idea. For details take a look at how to provide a [minimal reproducible example/attempt](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Maurits Evers, Feb 28 '19 at 00:13
@MauritsEvers Thanks, working on creating those and updating the post right now. — Lucy Toru, Feb 28 '19 at 00:16

score 0 · Accepted Answer · answered Feb 28 '19 at 02:03

Unless I misunderstood this seems to be a simple case of merge/left_join

In base R

merge(df2, df1, by.x = "Subject", by.y = "SID")
#  Subject Session Stimtype Valence Block Image        Group Response Stimulus
#1    7301       1  Control Neutral     1    12 Neutral_1660        1     1660
#2    7301       1  Control Neutral     1    17 Neutral_7233        1     7233
#3    7301       1  Control Neutral     1    22 Neutral_5300        1     5300
#  Experimenter
#1            2
#2            2
#3            2

Or using dplyr

library(dplyr)
left_join(df2, df1, by = c("Subject" = "SID"))

giving the same result

Sample data

df1 <- structure(list(SID = 7301:7302, Experimenter = c(2L, 1L)), .Names = c("SID",
"Experimenter"), class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(Subject = c(7301L, 7301L, 7301L), Session = c(1L,
1L, 1L), Stimtype = structure(c(1L, 1L, 1L), .Label = "Control", class =
"factor"),
Valence = structure(c(1L, 1L, 1L), .Label = "Neutral", class = "factor"),
Block = c(1L, 1L, 1L), Image = c(12L, 17L, 22L), Group = structure(c(1L,
3L, 2L), .Label = c("Neutral_1660", "Neutral_5300", "Neutral_7233"
), class = "factor"), Response = c(1L, 1L, 1L), Stimulus = c(1660L,
7233L, 5300L)), .Names = c("Subject", "Session", "Stimtype",
"Valence", "Block", "Image", "Group", "Response", "Stimulus"), class =
"data.frame", row.names = c(NA,
-3L))

Thank you Mauritis. It works with the sample data frames I created but with the full data frames I get the following errors: *left_join(ExperimenterSID, eprime1, by = c("Subject" = "SID")) Error: `by` can't contain join column `Subject` which is missing from LHS Call `rlang::last_error()` to see a backtrace* **OR** *> Joined1 <- merge(ExperimenterSID, eprime1, by.x = "Subject", by.y = "SID") Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column* — Lucy Toru, Feb 28 '19 at 02:44
@LucyToru Well this is obviously not easy to debug as we don't have access to your actual data. That's why sample data should be *representative* of your actual data. If your data looks different and/or has different column names, you need to make changes accordingly. In `left_join(ExperimenterSID, eprime1, by = c("Subject" = "SID"))` you are joining data by matching entries `ExperimenterSID$Subject` and `eprime1$SID`. You need to make sure that these columns exist. — Maurits Evers, Feb 28 '19 at 02:58
The data was representative of the actual data. I simply switched around the data frames, totally my mistake. Thanks for bearing with me. — Lucy Toru, Feb 28 '19 at 03:05

Populate New Variable In Data Frame Using Relationship Between Two Variables In Another Data Frame

1 Answers1

Sample data