6

I've looked around and I can't find a simple answer to this. How do I do what in SQL would be an update table? For example:

> df1 = data.frame(id=seq(1:3), v1=c("a", "b", NA))
> df1
  id   v1
1  1    a
2  2    b
3  3 <NA>
> df2 = data.frame(id=seq(1:3), v2=c("z", "y", "c"))
> df2
  id v2
1  1  z
2  2  y
3  3  c

How do I update df1 with values from v2 in v1, but only when id matches and when id > 2? I've looked at data.table, but can't figure out the := syntax, and hoping there is something simple in base R? Desired output would be:

> df1
  id   v1
1  1    a
2  2    b
3  3    c
Paul
  • 475
  • 8
  • 17

2 Answers2

2

SQLite One can use an update in sqlite via sqldf:

library(sqldf)

sqldf(c("update df1 
         set v1 = (select v2 from df2 where df2.id = df1.id) 
         where id > 2", 
        "select * from df1"))

which gives:

  id v1
1  1  a
2  2  b
3  3  c

MySQL This works in MySQL:

library(RMySQL)
library(sqldf)

sqldf(c("update df1 
  left join df2 on (df1.id = df2.id and df1.id > 2)
  set df1.v1 = coalesce(df2.v2, df1.v1)",
  "select * from df1")
)

giving:

  id v1
1  1  a
2  2  b
3  3  c

base R This also works. The first two lines are just to convert v1 and v2 to character and they can be avoided if v1 and v2 were already character:

df1c <- transform(df1, v1 = as.character(v1))
df2c <- transform(df2, v2 = as.character(v2))
transform(df1c, v1 = ifelse(id > 2, df2c[match(id, df2c$id), "v2"], v1))

Update Have incorporated comments and added base R solution.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • 2
    Maybe this is bordering on an opinion / theoretical question.. but why isn't there a nice function in R to do what I thought would be a common & necessary database (or even dataframe) operation? There are so many great functions in R to do almost everything else, I guess I'm just a bit surprised that this is complex. – Paul Mar 19 '14 at 19:29
  • Thanks for this - I think I might go this route.. seems much simpler to keep track of what I'm trying to match without brackets everywhere – Paul Mar 19 '14 at 19:30
  • @Grothendieck Thanks for this sqlite solution. I think the sql could be simplified. set-part: `set v1 = (select v2 from df2 where df2.id = df1.id)`. where-part: `where id > 2`. – giordano Aug 17 '16 at 06:42
1

Updated to work when there are ids present in df1 not in df2, and also if orders are different. This works so long as there is only one id column:

df1 <- data.frame(id=seq(1:5), v1=c("a", "b", NA, NA, NA), stringsAsFactors=F)
df2 <- data.frame(id=seq(1:3), v2=c("z", "y", "c"), stringsAsFactors=F)

df1[df1$id > 2, -1] <- df2[df1$id[df1$id > 2], -1]
df1

Produces:

  id   v1
1  1    a
2  2    b
3  3    c
4  4 <NA>
5  5 <NA>

Here is a simple solution that works so long as both data frames have the same id set:

df1[df1$id > 2, ] <- df2[df1$id > 2, ]

Produces:

  id v1
1  1  a
2  2  b
3  3  c

Big note though, v1 and v2 need to be character, so run this before as they are factor by default:

df1$v1 <- as.character(df1$v1)    
df2$v2 <- as.character(df2$v2)

If you need to join on multiple columns or if the ids in one table don't all exist in the other you can use merge or data.table to get both variables on one table, and then construct the new column by combining the columns with ifelse.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • 1
    ah, ok there are extra columns in both dfs, and there are extra ids in each id column. how would the ifelse syntax work for each row? There isn't a more direct approach? It is so easy in sql.. – Paul Mar 18 '14 at 22:25
  • something like this? `df1$v1[is.na(df1$v1)] = df1$v2[is.na(df1$v1)]` – Paul Mar 18 '14 at 22:27
  • @Paul, see update. Note that multiple data columns isn't a problem. This stops working if you have multiple id columns. – BrodieG Mar 18 '14 at 22:40