How to merge multiple columns of a dataframe into one in R?

Question

I am once again asking a R/dataframe-related question. Recently, I have started implementing the merge() function, but I need help with merging some of the identical columns after merging different similar data frames together.

    library(dplyr)  
    library(tidyr)  
    library(stringr)  
    DF1 = data.frame(ID=c("A","B","C"), TagNo=c(10,20,20),Info=c("Metal","Plastic","Wood"))
    DF2 = data.frame(ID=c("A","D","E"), TagNo=c(10,30,100),Info=c("Wood","Metal","Wood"))
    DF3 = data.frame(ID=c("B","C","F"), TagNo=c(20,20,40),Info=c("Glass","Metal","Wood"))
    dfmerge <- merge(DF1,DF2,by="ID",all.x=TRUE, all.y=TRUE)
    dfmerge <- merge(dfmerge,DF3,by="ID",all.x=TRUE, all.y=TRUE)

The resulting merged dataframe looks like this:

ID	TagNo	Info.x	TagNo.y	Info.y	TagNo	Info
A	10	Metal	10	Wood	NA	NA
B	20	Plastic	NA	NA	50	Glass
C	20	Wood	NA	NA	20	Metal
D	NA	NA	30	Metal	NA	NA
E	NA	NA	100	Wood	NA	NA
F	NA	NA	NA	NA	40	Wood

I essentially want to condense the TagNo columns into just one Tag column. To highlight something important. Each sample has at least 1 TagNo, and if it has more than 1, it will always be the same. I don't ever have to worry about them being different.

This is the result I need:

ID	TagNo.x	Info.x	Info.y	Info
A	10	Metal	Wood	NA
B	20	Plastic	NA	Glass
C	20	Wood	NA	Metal
D	30	NA	Metal	NA
E	100	NA	Wood	NA
F	40	NA	NA	Wood

Does anyone know how to approach this problem? I appreciate any help. [Column headers aren't a big issue as I can simply fix that after with colnames(dfmerge) <- ...

Try with `coalesce` `library(dplyr);dfmerge$TagNo <- with(dfmerge, coalesce(TagNo.x, TagNo.y))` — akrun, Apr 26 '23 at 17:02
This link may be relevant: https://stackoverflow.com/questions/19253820/how-to-implement-coalesce-efficiently-in-r — jpsmith, Apr 26 '23 at 17:20

score 1 · Answer 1 · answered Apr 26 '23 at 18:38

1

Try this:

dfmerge <- merge(DF1,DF2,by=c("ID", "TagNo"),all.x=TRUE, all.y=TRUE)
dfmerge <- merge(dfmerge,DF3,by=c("ID", "TagNo"),all.x=TRUE, all.y=TRUE)

answered Apr 26 '23 at 18:38

Jessy

21
4

score 1 · Accepted Answer · answered Apr 26 '23 at 19:16

are you looking for dplyr joining functions ?

library(dplyr)  
DF1 = data.frame(ID=c("A","B","C"),TagNo=c(10,20,20),Info=c("Metal","Plastic","Wood"))
DF2 = data.frame(ID=c("A","D","E"), TagNo=c(10,30,100),Info=c("Wood","Metal","Wood"))
DF3 = data.frame(ID=c("B","C","F"), TagNo=c(20,20,40),Info=c("Glass","Metal","Wood"))

DF1%>%
full_join(DF2,by=c("ID","TagNo"))%>%
full_join(DF3,by=c("ID","TagNo"))          
#>   ID TagNo  Info.x Info.y  Info
#> 1  A    10   Metal   Wood  <NA>
#> 2  B    20 Plastic   <NA> Glass
#> 3  C    20    Wood   <NA> Metal
#> 4  D    30    <NA>  Metal  <NA>
#> 5  E   100    <NA>   Wood  <NA>
#> 6  F    40    <NA>   <NA>  Wood

^{Created on 2023-04-26 with reprex v2.0.2}

Perfect, precisely what I wanted especially since the data I am working with has so many more vars and columns! — user20330606, Apr 26 '23 at 20:20

How to merge multiple columns of a dataframe into one in R?

2 Answers2