Assign values from one data frame to another based on subject numbers

Question

I have two data frames with one same column name subject. However, in the first data frame, there are many subeject 1 rows, but in the second data frame, there is only one subject 1 row. How can I combine these two data frames? Put it in other ways, how do I make a new column in the first data frame with values in the second data frames that match the subject number in the first data frame?

data frame 1

data1 <- structure(list( subject = c("subject1", "subject1", "subject1", "subject1", 
"subject1", "subject1", "subject1", "subject1", "subject1", "subject1", 
"subject1", "subject1", "subject1", "subject1", "subject10", "subject10", 
"subject10", "subject10", "subject10", "subject10", "subject10", "subject10", 
"subject10", "subject10", "subject10", "subject10", "subject10", "subject10"
)), row.names = c(NA, -28L), groups = structure(list( .rows = structure(list(
    c(7L, 9L, 11L, 15L, 18L, 19L, 21L, 25L, 27L, 28L), 4:5, 1L, 
    c(3L, 14L, 17L, 26L), c(8L, 22L, 24L), c(6L, 10L, 12L, 23L
    ), c(2L, 13L, 16L, 20L)), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 7L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

data frame 2

data2 <- structure(list(subject = c("subject1", 
"subject10"), mean_per = c(63.5, 51.9285714285714)), row.names = c(NA, 
-2L), groups = structure(list(subject = c("subject1", "subject10"
), .rows = structure(list(1L, 2L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

score 2 · Answer 1 · answered Mar 02 '21 at 15:54

By using dplyr and the left_join function:

library(dplyr)
data1 %>% left_join(data2, by="subject")

From the man page:

left_join()

return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned

score 2 · Answer 2 · answered Mar 02 '21 at 15:57

I think you are looking for some function like merge

> merge(data1,data2)
     subject mean_per
1   subject1 63.50000
2   subject1 63.50000
3   subject1 63.50000
4   subject1 63.50000
5   subject1 63.50000
6   subject1 63.50000
7   subject1 63.50000
8   subject1 63.50000
9   subject1 63.50000
10  subject1 63.50000
11  subject1 63.50000
12  subject1 63.50000
13  subject1 63.50000
14  subject1 63.50000
15 subject10 51.92857
16 subject10 51.92857
17 subject10 51.92857
18 subject10 51.92857
19 subject10 51.92857
20 subject10 51.92857
21 subject10 51.92857
22 subject10 51.92857
23 subject10 51.92857
24 subject10 51.92857
25 subject10 51.92857
26 subject10 51.92857
27 subject10 51.92857
28 subject10 51.92857

akrun · Accepted Answer · 2021-03-02T16:51:10.713

2

Using data.table

library(data.table)
setDT(data1)[data2, mean_per2 := i.mean_per, on = .(subject)]

edited Mar 02 '21 at 16:51

answered Mar 02 '21 at 15:58

akrun

874,273
37
540
662

Sorry, is the second `mean_per` the column I want to add in to data 1? In my real data, there is already a `mean_per` in data 1, I need `mean_per` in data 2 as a new column. I do this: `setDT(data_ge)[CS_position1, CS_aq := mean_per, on = .(subject)]`, but I get a column same as `mean_per` in data 1. – Kenny Mar 02 '21 at 16:50
1

@Shepard in that case, you can assign as `setDT(data1)[data2, mean_per2 := i.mean_per, on = .(subject)]` – akrun Mar 02 '21 at 16:51

Assign values from one data frame to another based on subject numbers

3 Answers3