filling in columns with matching IDs from two dataframes in R

Question

I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you.

df1:
ID    AGE   SEX
90901   39  0
90902   28  0
90903   40  1

df2:
ID     AGE  SEX  Conc
90901   NA  NA    5
90901   NA  NA    10
90901   NA  NA    15
90903   NA  NA    30
90903   NA  NA    5
90902   NA  NA    2.45
90902   NA  NA    51
90902   NA  NA    1
70905   NA  NA    0.5

result:
df3:
ID     AGE  SEX  Conc
90901   39  0     5
90901   39  0     10
90901   39  0     15
90903   40  1    30
90903   40  1    5
90902   28  1    2.45
90902   28  0    51
90902   28  0     1
70905   NA  NA    0.5

possible duplicate of [How to join data frames in R (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right) — Henrik, Aug 28 '14 at 06:50
Thank you all! That is amazing:-) I hope I come to a point where I understand and play with the apply family functions. They are very useful but needs hard thinking. Would be great if some body guide me to some useful resources. — Amer, Aug 29 '14 at 15:39

Rich Scriven · Accepted Answer · 2017-01-17T17:55:36.440

11

You could use match with lapply for this. If we iterate [[ with matching on the ID column of each of the original data sets over a vector of names, we can get the desired result.

nm <- c("AGE", "SEX")
df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)])
df2
#      ID AGE SEX  Conc
# 1 90901  39   0  5.00
# 2 90901  39   0 10.00
# 3 90901  39   0 15.00
# 4 90903  40   1 30.00
# 5 90903  40   1  5.00
# 6 90902  28   0  2.45
# 7 90902  28   0 51.00
# 8 90902  28   0  1.00
# 9 70905  NA  NA  0.50

Note that this is also quite a bit faster than merge.

edited Jan 17 '17 at 17:55

answered Aug 28 '14 at 01:42

Rich Scriven

97,041
11
181
245

Is there a way to apply this method so that it won't overwrite any non-NA values in df2$AGE and df2$SEX? – user3594490 Mar 20 '18 at 18:47
Out of nowhere, this solution saved me hours of search. Thanks. – CaseebRamos Feb 27 '20 at 00:34
This worked, but the initial dataframe had 250Mb and now has 485Gb... – JMarcelino Jul 03 '20 at 15:27

jed · Answer 2 · 2014-08-28T04:31:57.377

Try merge(df1, df2, by = "id"). This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge.

df2$AGE <- NULL
df2$SEX <- NULL
df3 <- merge(df1, df2, by = "id")

If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this:

df2 <- subset(df2, select = -c(AGE,SEX) )
df3 <- merge(df1, df2, by = "id", all.y = TRUE)

You can learn more about merge (or any r function) by typing ?merge() in your r console.

score 0 · Answer 3 · answered Jun 08 '23 at 14:52

Here is a dplyr(v1.0.0) approach, which only overwrites NA values in df2:

rows_patch(df2 %>%mutate(across(AGE:SEX,as.integer)),df1)

     ID AGE SEX  Conc
1 90901  39   0  5.00
2 90901  39   0 10.00
3 90901  39   0 15.00
4 90903  40   1 30.00
5 90903  40   1  5.00
6 90902  28   0  2.45
7 90902  28   0 51.00
8 90902  28   0  1.00
9 70905  NA  NA  0.50

filling in columns with matching IDs from two dataframes in R

3 Answers3

Linked

Related