0

I have two dfs as below and want to merge. I want to merge these two dfs based on the family column and add the count for each gene without removing the genes in the final df if they are not common between the first df and the second one.

#first df
Family <- c("LET-7","LET-7","LET-7","MIR-10","MIR-103","MIR-124","MIR-124","MIR-124")
Sequence <- c("ATCGGCA","ATGCTAC","ATCGGCA","ATCGTTT","TGAGGAG","TGATCAG","AATTCAG","AATTCAG")
my_data_frame <- data.frame(Family,Sequence)
#second df
counts <- c("2","3")
Family <- c("LET-7","MIR-124")
countdf <- data.frame(Family,counts)

the output that I want to have

Family <- c("LET-7","LET-7","LET-7","MIR-10","MIR-103","MIR-124","MIR-124","MIR-124")
Counts <- c("2","2","2","0","0","3","3","3")
Sequence <- c("ATCGGCA","ATGCTAC","ATCGGCA","ATCGTTT","TGAGGAG","TGATCAG","AATTCAG","AATTCAG")
newdf <- data.frame(Family,Counts,Sequence)
Apex
  • 1,055
  • 4
  • 22
  • You might want to initialize your data.frames with `stringsAsFactors = FALSE` , that prevents some inconveniences later on – SebSta Feb 07 '20 at 08:54

1 Answers1

1

Solution using package dplyr

library(dplyr)
newdf_dplyr <- my_data_frame %>% 
  left_join(countdf)

Solution using base R:

newdf_base <- merge(my_data_frame, countdf, by="Family", all.x=TRUE)
dario
  • 6,415
  • 2
  • 12
  • 26