Update rows of data frame in R

Question

Suppose I start with a data frame:

 ID Measurement1 Measurement2
  1           45          104
  2           34           87
  3           23           99
  4           56           67
...

Then I have a second data frame which is meant to be used to update records in the first:

 ID Measurement1 Measurement2
  2           10           11
  4           21           22

How do I use R to end up with:

 ID Measurement1 Measurement2
  1           45          104
  2           10           11
  3           23           99
  4           21           22
...

The data frames in reality are very large datasets.

akrun · Accepted Answer · 2015-10-02T02:59:23.740

We can use match to get the row index. Using that index to subset the rows, we replace the 2nd and 3rd columns of the first dataset with the corresponding columns of second dataset.

ind <- match(df2$ID, df1$ID)
df1[ind, 2:3] <- df2[2:3]
df1
#  ID Measurement1 Measurement2
#1  1           45          104
#2  2           10           11
#3  3           23           99
#4  4           21           22

Or we can use data.table to join the dataset on the 'ID' column (after converting the first dataset to 'data.table' i.e. setDT(df1)), and assign the 'Cols' with the 'iCols' from the second dataset.

 library(data.table)#v1.9.6+
 Cols <- names(df1)[-1]
 iCols <- paste0('i.', Cols)
 setDT(df1)[df2, (Cols) := mget(iCols), on= 'ID'][]
 #   ID Measurement1 Measurement2
 #1:  1           45          104
 #2:  2           10           11
 #3:  3           23           99
 #4:  4           21           22

data

df1 <- structure(list(ID = 1:4, Measurement1 = c(45L, 34L, 23L, 56L), 
Measurement2 = c(104L, 87L, 99L, 67L)), .Names = c("ID", 
"Measurement1", "Measurement2"), class = "data.frame",
 row.names = c(NA, -4L))

df2 <-  structure(list(ID = c(2L, 4L), Measurement1 = c(10L, 21L),
 Measurement2 = c(11L, 
 22L)), .Names = c("ID", "Measurement1", "Measurement2"),
 class = "data.frame", row.names = c(NA, -2L))

score 8 · Answer 2 · answered Oct 02 '15 at 03:46

8

library(dplyr)

df1 %>%
  anti_join(df2, by = "ID") %>%
  bind_rows(df2) %>%
  arrange(ID)

answered Oct 02 '15 at 03:46

bramtayl

4,004
2
11
18

This also works as a good workaround for rows_upsert() which is having a problems when duplicates are present – rkabuk May 10 '23 at 17:17

score 7 · Answer 3 · answered Jun 12 '20 at 15:27

dplyr 1.0.0 introduced a family of SQL-inspired functions for modifying rows. In this case you can now use rows_update():

library(dplyr)

df1 %>%
  rows_update(df2, by = "ID")

  ID Measurement1 Measurement2
1  1           45          104
2  2           10           11
3  3           23           99
4  4           21           22

Update rows of data frame in R

3 Answers3

data

Linked

Related