1

I search for a generic data frame update function like the sql-update that updates values in the first data frame in case the keys match with the keys in the second data frame. Is there a more generic way as in my example, maybe also by considering the value names? Something like a generic dplyr::update(df1, df2, by = "key") function?

library(tidyverse)
# example data frame
df1 <- as_data_frame(list(key = c(1,2,3,4,5,6,7,8,9),
                          v1 = c(11,12,13,14,15,16,17,18,19),
                          v2 = c(21,22,23,24,25,26,27,28,29),
                          v3 = c(31,32,33,34,35,36,37,38,39),
                          v4 = c(41,42,43,44,45,46,47,48,49)))
df2 <- as_data_frame(list(key = c(3,5,9),
                          v2 = c(231,252,293),
                          v4 = c(424,455,496)))

# update df1 with values from df2 where key match
org_names <- df1 %>% names()
df1 <- df1 %>% 
  left_join(df2, by = "key") %>% 
  mutate(v2 = ifelse(is.na(v2.y), v2.x, v2.y),
         v4 = ifelse(is.na(v4.y), v4.x, v4.y)) %>% 
  select(org_names)

> df1
# A tibble: 9 x 5
key    v1    v2    v3    v4
<dbl> <dbl> <dbl> <dbl> <dbl>
1     1    11    21    31    41
2     2    12    22    32    42
3     3    13   231    33   424
4     4    14    24    34    44
5     5    15   252    35   455
6     6    16    26    36    46
7     7    17    27    37    47
8     8    18    28    38    48
9     9    19   293    39   496
> 
Guido Berning
  • 187
  • 2
  • 12
  • I'm not aware of anything. However there is `dplyr::db_write_table` which has an argument to create a temporary table and `DBI::dbExecute` which you can use to execute an update using the newly created temporary table. – crazybilly Nov 16 '17 at 15:24
  • I also found a similar question https://stackoverflow.com/questions/45217477/efficient-left-join-and-subsequent-merge with an answer to use data.table – Guido Berning Nov 16 '17 at 15:28
  • Ah, I misread your question to be asking about making updates in an SQL database--you're just wanting to edit the local data frame object, right? – crazybilly Nov 16 '17 at 15:36

1 Answers1

1

1) %<>% Magrittr has the compound assignment pipe:

library(magrittr)

df1 %>% 
    { keys <- intersect(.$key, df2$key)
    .[match(keys, .$key), names(df2)] %<>% { df2[match(keys, df2$key), ] }
    .
}

which, for the problem under consideration, simplifies to this because all keys in df2 are in df1:

df1 %>% { .[match(df2$key, .$key), names(df2)]  %<>% { df2 }; . }

2) <- The basic R assignment operator could also be used in much the same way and, in fact, the code is shorter than (1):

df1 %>% 
    { keys <- intersect(.$key, df2$key)
    .[match(keys, .$key), names(df2)] <- df2[match(keys, df2$key), ]
    .
}

however, for the problem under consideration all keys in df2 are in df1 so it simplifies to:

df1 %>% { .[match(df2$key, .$key), names(df2)] <- df2; . }

3) mutate_cond Using mutate_cond defined in this SO post we can write the following.

df1 %>% mutate_cond(.$key %in% df2$key, v2 = df2$v2, v4 = df2$v4)

Note: The first two approaches work if the keys in df1 and df2 are each unique. The third additionally requires the keys be in the same order and every key in df2 be in df1. The problem in the question satisfies these.

Update: Have somewhat generalized the code in (1) and (2).

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341