1

I would like to used nested full_join to merge several data frames together. In addition, I am hoping to be able to add suffixes to all of the columns so that when the data frames are merged each column name indicates which data frame it came from (e.g., a unique time identifier like T1, T2, ...).

x <- data.frame(i = c("a","b","c"), j = 1:3, h = 1:3, stringsAsFactors=FALSE)
y <- data.frame(i = c("b","c","d"), k = 4:6, h = 1:3, stringsAsFactors=FALSE)
z <- data.frame(i = c("c","d","a"), l = 7:9, h = 1:3, stringsAsFactors=FALSE)

full_join(x, y, by='i') %>% left_join(., z, by='I')

Is there a way to integrate the default suffix option so that I get a dataset with column names that look like:

column_names <- c("i", "j_T1", "h_T1", "k_T2", "h_T2", "l_T3", "h_T3")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Addison
  • 143
  • 7
  • What is your desired output in this case? – AnilGoyal Dec 05 '20 at 14:51
  • Do the numbers of the columns need to be in the order they occur? Could it be a different order sequence for H for example where there are 3 columns? column_names <- c("i", "j_T1", "h_T3", "k_T2", "h_T2", "l_T3", "h_T1"). Also is it right it is l_T3 as there is only 1 of them in this sample? – Zoë Turner Dec 05 '20 at 21:09

2 Answers2

3

Put the dataframes in a list, add the dataframe name as suffix and perform the join.

library(dplyr)
library(purrr)

lst(x, y, z) %>%
  imap(function(x, y) x %>% rename_with(~paste(., y, sep = '_'), -i)) %>%
  reduce(full_join, by = 'i')

#  i j_x h_x k_y h_y l_z h_z
#1 a   1   1  NA  NA   9   3
#2 b   2   2   4   1  NA  NA
#3 c   3   3   5   2   7   1
#4 d  NA  NA   6   3   8   2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

I think this can be done by working with the column headers using purrr but I've used pivot_wider and pivot_longer to change the header names:

df <- x %>% 
  full_join(y, by = "i") %>% 
  full_join(z, by = "i") %>% 
  pivot_longer(cols = -i,
               names_to = "columns",
               values_to = "values") %>% # makes the column headers into a column 
which can be changed
  mutate(columns = str_replace(columns, ".x", "_T2"),
         columns = str_replace(columns, ".y", "_T3"),
         columns = case_when(!str_detect(columns, "T") ~ paste0(columns, "_T1"),
                             TRUE ~ columns)) %>% 
  pivot_wider(names_from = columns,
              values_from = values)

These don't match the listed headers but hopefully this code will help to get you started if the order is important and column l should be T3 (there was only 1 in this example).

Zoë Turner
  • 459
  • 5
  • 8