4

How to combines dataframe more easily?

I have three dataframes (table_base / table_a / table_b). I want to combine them by row to obtain the result as 'table_final'. Below is the code I have, it works, but it is a little bit complicated. How can I simplify it ? Actually, I will have more tables to join than just table_a and table_b.

    library(dplyr)
    table_base <- data.frame(cat=c("a","b","c","d"))
    
    table_a <- data.frame(cat=c("a","b"),
                          value=c(1,2))
    
    table_b <- data.frame(cat=c("a","c","d"),
                          value=c(7,9,10))


table_final <- table_base %>% 
  left_join(table_a,by='cat',fill=0) %>% 
  left_join(table_b,by='cat') %>% 
  mutate(value=if_else(!is.na(value.x),value.x,value.y)) %>% 
  select(cat,value)

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
anderwyang
  • 1,801
  • 4
  • 18
  • 2
    Assuming you have a list of data frames, you can do `list(table_a, table_b) %>% reduce(left_join)`. If you want to do `fill=0` only to some tables, you need to treat them separatley – danlooo Nov 24 '21 at 09:38
  • 1
    This is a possible duplicate from [https://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list](https://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list) – Mata Nov 24 '21 at 09:45

2 Answers2

4

Using purrr::reduce to merge multiple dataframes, then use dplyr::coalesce to get first non-na value:

library(dplyr)
library(purrr)

list(table_base, table_a, table_b) %>% 
  reduce(left_join, by = "cat") %>% 
  mutate(value = coalesce(!!!select(., starts_with("value")))) %>% 
  select(cat, value)

#   cat value
# 1   a     1
# 2   b     2
# 3   c     9
# 4   d    10
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • thanks, that's great , ”mutate(value = coalesce(!!!select(., starts_with("value"))))“,'coalesce' can find first not na value , but what's the usage '!!!' ? – anderwyang Nov 24 '21 at 10:08
  • @anderwyang see here: https://stackoverflow.com/a/61180370/680068 and here https://adv-r.hadley.nz/quasiquotation.html#unquoting-many-arguments – zx8754 Nov 24 '21 at 10:11
2

You just need to take appropriate rows from each table and bind them:

table_list <- list(table_a, table_b) 

table_list %>%
  map("cat") %>%
  map2(c(list(NULL), accumulate(head(., -1), union)), setdiff) %>%
  map2_dfr(table_list, ~filter(.y, cat %in% .x))
det
  • 5,013
  • 1
  • 8
  • 16