1

I am once again asking a R/dataframe-related question. Recently, I have started implementing the merge() function, but I need help with merging some of the identical columns after merging different similar data frames together.

    library(dplyr)  
    library(tidyr)  
    library(stringr)  
    DF1 = data.frame(ID=c("A","B","C"), TagNo=c(10,20,20),Info=c("Metal","Plastic","Wood"))
    DF2 = data.frame(ID=c("A","D","E"), TagNo=c(10,30,100),Info=c("Wood","Metal","Wood"))
    DF3 = data.frame(ID=c("B","C","F"), TagNo=c(20,20,40),Info=c("Glass","Metal","Wood"))
    dfmerge <- merge(DF1,DF2,by="ID",all.x=TRUE, all.y=TRUE)
    dfmerge <- merge(dfmerge,DF3,by="ID",all.x=TRUE, all.y=TRUE)
    

The resulting merged dataframe looks like this:

ID TagNo Info.x TagNo.y Info.y TagNo Info
A 10 Metal 10 Wood NA NA
B 20 Plastic NA NA 50 Glass
C 20 Wood NA NA 20 Metal
D NA NA 30 Metal NA NA
E NA NA 100 Wood NA NA
F NA NA NA NA 40 Wood

I essentially want to condense the TagNo columns into just one Tag column. To highlight something important. Each sample has at least 1 TagNo, and if it has more than 1, it will always be the same. I don't ever have to worry about them being different.

This is the result I need:

ID TagNo.x Info.x Info.y Info
A 10 Metal Wood NA
B 20 Plastic NA Glass
C 20 Wood NA Metal
D 30 NA Metal NA
E 100 NA Wood NA
F 40 NA NA Wood

Does anyone know how to approach this problem? I appreciate any help. [Column headers aren't a big issue as I can simply fix that after with colnames(dfmerge) <- ...

user20330606
  • 105
  • 7
  • 2
    Try with `coalesce` `library(dplyr);dfmerge$TagNo <- with(dfmerge, coalesce(TagNo.x, TagNo.y))` – akrun Apr 26 '23 at 17:02
  • This link may be relevant: https://stackoverflow.com/questions/19253820/how-to-implement-coalesce-efficiently-in-r – jpsmith Apr 26 '23 at 17:20

2 Answers2

1

Try this:

dfmerge <- merge(DF1,DF2,by=c("ID", "TagNo"),all.x=TRUE, all.y=TRUE)
dfmerge <- merge(dfmerge,DF3,by=c("ID", "TagNo"),all.x=TRUE, all.y=TRUE)
Jessy
  • 21
  • 4
1

are you looking for dplyr joining functions ?

library(dplyr)  
DF1 = data.frame(ID=c("A","B","C"),TagNo=c(10,20,20),Info=c("Metal","Plastic","Wood"))
DF2 = data.frame(ID=c("A","D","E"), TagNo=c(10,30,100),Info=c("Wood","Metal","Wood"))
DF3 = data.frame(ID=c("B","C","F"), TagNo=c(20,20,40),Info=c("Glass","Metal","Wood"))

DF1%>%
full_join(DF2,by=c("ID","TagNo"))%>%
full_join(DF3,by=c("ID","TagNo"))          
#>   ID TagNo  Info.x Info.y  Info
#> 1  A    10   Metal   Wood  <NA>
#> 2  B    20 Plastic   <NA> Glass
#> 3  C    20    Wood   <NA> Metal
#> 4  D    30    <NA>  Metal  <NA>
#> 5  E   100    <NA>   Wood  <NA>
#> 6  F    40    <NA>   <NA>  Wood

Created on 2023-04-26 with reprex v2.0.2

Wael
  • 1,640
  • 1
  • 9
  • 20