Reorganize and match data sets by column in R

Question

this is my first post so please excuse any errors.

I have two data sets that I want to merge by sample name, the issue is that the second data sets has extra labeling in the sample name and is not in the same order as the first data set.

Clinical                                                                  
Patient, Cell Count                                                            
BB-01-D1    7
BB-02-D1    4
BB-04-D30   2

Flow                                                                      
Patient,          Cell Count                                                     
 2-5-19_BB-01-D1     7 
 3-15-19_BB-04-D30   2
 2-6-19_BB-02-D1     4

I'm wondering if there is a way to combine and match if the column "Patient" contains partially the same name, or if there is a way to get rid of the extra labeling in all of the rows in the Patient column so I can simply reorder.

Thank you in advance.

Please try to share a minimum, reproducible example of what you have tried so far — nircraft, Jan 15 '19 at 20:24
Take a look at `stringr` or do some regex with base functions. — NelsonGon, Jan 15 '19 at 20:36
Possible duplicate of [How can I match fuzzy match strings from two datasets?](https://stackoverflow.com/questions/26405895/how-can-i-match-fuzzy-match-strings-from-two-datasets) — divibisan, Jan 15 '19 at 20:42

NelsonGon · Accepted Answer · 2019-01-15T20:59:36.473

Here is a possibility:

library(tidyverse)
df1<-read.table(text="Patient Cell Count                                                            
BB-01-D1    7
                BB-02-D1    4
                BB-04-D30   2",header=T,fill=T)
df1<-df1[,-ncol(df1)]
df2<-read.table(text="Patient,          Cell Count                                                     
 2-5-19_BB-01-D1     7 
                3-15-19_BB-04-D30   2
                2-6-19_BB-02-D1     4",header=T,fill=T)
df2<-df2[,-ncol(df2)]
df2<-df2 %>% 
  mutate(Patient.=str_remove_all(df2$Patient.,".*(?<=_)"))

Then from here proceed as you may wish

cbind(df1,df2) #Cell Count labels lost due to reading errors. Will work on 
                                    #my data import
     Patient Cell  Patient. Cell
1  BB-01-D1    7   BB-01-D1    7
2  BB-02-D1    4   BB-04-D30    2
3 BB-04-D30    2   BB-02-D1    4

OR:

df1<-df1 %>% 
  mutate(Patient=as.factor(Patient))
df2<-df2 %>% 
  rename(Patient=Patient.) %>% 
  mutate(Patient=as.factor(Patient))
merged<-df1 %>% 
  left_join(df2,"Patient") 
names(merged)<-c("Patient","Clinical","Flow")

Result:

 Patient      Clinical Flow
1  BB-01-D1        7    7
2  BB-02-D1        4    4
3 BB-04-D30        2    2

Reorganize and match data sets by column in R

1 Answers1