1

I have two data frames with one same column name subject. However, in the first data frame, there are many subeject 1 rows, but in the second data frame, there is only one subject 1 row. How can I combine these two data frames? Put it in other ways, how do I make a new column in the first data frame with values in the second data frames that match the subject number in the first data frame?

data frame 1

data1 <- structure(list( subject = c("subject1", "subject1", "subject1", "subject1", 
"subject1", "subject1", "subject1", "subject1", "subject1", "subject1", 
"subject1", "subject1", "subject1", "subject1", "subject10", "subject10", 
"subject10", "subject10", "subject10", "subject10", "subject10", "subject10", 
"subject10", "subject10", "subject10", "subject10", "subject10", "subject10"
)), row.names = c(NA, -28L), groups = structure(list( .rows = structure(list(
    c(7L, 9L, 11L, 15L, 18L, 19L, 21L, 25L, 27L, 28L), 4:5, 1L, 
    c(3L, 14L, 17L, 26L), c(8L, 22L, 24L), c(6L, 10L, 12L, 23L
    ), c(2L, 13L, 16L, 20L)), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 7L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

data frame 2

data2 <- structure(list(subject = c("subject1", 
"subject10"), mean_per = c(63.5, 51.9285714285714)), row.names = c(NA, 
-2L), groups = structure(list(subject = c("subject1", "subject10"
), .rows = structure(list(1L, 2L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))
Kenny
  • 361
  • 1
  • 8

3 Answers3

2

By using dplyr and the left_join function:

library(dplyr)
data1 %>% left_join(data2, by="subject")

From the man page:

left_join()

return all rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns. If there are multiple matches between x and y, all combinations of the matches are returned

MKR
  • 1,620
  • 7
  • 20
2

I think you are looking for some function like merge

> merge(data1,data2)
     subject mean_per
1   subject1 63.50000
2   subject1 63.50000
3   subject1 63.50000
4   subject1 63.50000
5   subject1 63.50000
6   subject1 63.50000
7   subject1 63.50000
8   subject1 63.50000
9   subject1 63.50000
10  subject1 63.50000
11  subject1 63.50000
12  subject1 63.50000
13  subject1 63.50000
14  subject1 63.50000
15 subject10 51.92857
16 subject10 51.92857
17 subject10 51.92857
18 subject10 51.92857
19 subject10 51.92857
20 subject10 51.92857
21 subject10 51.92857
22 subject10 51.92857
23 subject10 51.92857
24 subject10 51.92857
25 subject10 51.92857
26 subject10 51.92857
27 subject10 51.92857
28 subject10 51.92857
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
2

Using data.table

library(data.table)
setDT(data1)[data2, mean_per2 := i.mean_per, on = .(subject)]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Sorry, is the second `mean_per` the column I want to add in to data 1? In my real data, there is already a `mean_per` in data 1, I need `mean_per` in data 2 as a new column. I do this: `setDT(data_ge)[CS_position1, CS_aq := mean_per, on = .(subject)]`, but I get a column same as `mean_per` in data 1. – Kenny Mar 02 '21 at 16:50
  • 1
    @Shepard in that case, you can assign as `setDT(data1)[data2, mean_per2 := i.mean_per, on = .(subject)]` – akrun Mar 02 '21 at 16:51