6

I have two dataframes (df1, df2). I want to fill in the AGE and SEX values from df1 to df2 conditioned on having the same ID between the two. I tried several ways using for-loop and checking subject ID match between the two data frame but I failed. The result should be as in the df3. I have a huge dataset, so I want a piece of code in R that can do this easily. I would appreciate your assistance in this. Thank you.

df1:
ID    AGE   SEX
90901   39  0
90902   28  0
90903   40  1

df2:
ID     AGE  SEX  Conc
90901   NA  NA    5
90901   NA  NA    10
90901   NA  NA    15
90903   NA  NA    30
90903   NA  NA    5
90902   NA  NA    2.45
90902   NA  NA    51
90902   NA  NA    1
70905   NA  NA    0.5

result:
df3:
ID     AGE  SEX  Conc
90901   39  0     5
90901   39  0     10
90901   39  0     15
90903   40  1    30
90903   40  1    5
90902   28  1    2.45
90902   28  0    51
90902   28  0     1
70905   NA  NA    0.5
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Amer
  • 2,131
  • 3
  • 23
  • 38
  • possible duplicate of [How to join data frames in R (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right) – Henrik Aug 28 '14 at 06:50
  • 1
    Thank you all! That is amazing:-) I hope I come to a point where I understand and play with the apply family functions. They are very useful but needs hard thinking. Would be great if some body guide me to some useful resources. – Amer Aug 29 '14 at 15:39

3 Answers3

11

You could use match with lapply for this. If we iterate [[ with matching on the ID column of each of the original data sets over a vector of names, we can get the desired result.

nm <- c("AGE", "SEX")
df2[nm] <- lapply(nm, function(x) df1[[x]][match(df2$ID, df1$ID)])
df2
#      ID AGE SEX  Conc
# 1 90901  39   0  5.00
# 2 90901  39   0 10.00
# 3 90901  39   0 15.00
# 4 90903  40   1 30.00
# 5 90903  40   1  5.00
# 6 90902  28   0  2.45
# 7 90902  28   0 51.00
# 8 90902  28   0  1.00
# 9 70905  NA  NA  0.50

Note that this is also quite a bit faster than merge.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
8

Try merge(df1, df2, by = "id"). This will merge your two data frames together. If your example is a good representation of your actual data, then you might want to go ahead and drop the age and sex columns from df2 before you merge.

df2$AGE <- NULL
df2$SEX <- NULL
df3 <- merge(df1, df2, by = "id")

If you need to keep rows from df2 even when you don't have a matching id in df1, then you do this:

df2 <- subset(df2, select = -c(AGE,SEX) )
df3 <- merge(df1, df2, by = "id", all.y = TRUE)

You can learn more about merge (or any r function) by typing ?merge() in your r console.

jed
  • 615
  • 3
  • 11
0

Here is a dplyr(v1.0.0) approach, which only overwrites NA values in df2:

rows_patch(df2 %>%mutate(across(AGE:SEX,as.integer)),df1)

     ID AGE SEX  Conc
1 90901  39   0  5.00
2 90901  39   0 10.00
3 90901  39   0 15.00
4 90903  40   1 30.00
5 90903  40   1  5.00
6 90902  28   0  2.45
7 90902  28   0 51.00
8 90902  28   0  1.00
9 70905  NA  NA  0.50
one
  • 3,121
  • 1
  • 4
  • 24