2

I have the following data frames

df1 <- data.frame(
    Description=c("How are you- doing?", "will do it tomorrow otherwise: next week", "I will work hard to complete it for nextr week1 or  tomorrow", "I am HAPPY with this situation now","Utilising this approach can helpα'x-ray", "We need to use interseting <U+0452> books to solve the issue", "Not sure if we could do it appropriately.", "The schools and Universities are closed in f -blook for a week",  "Things are hectic here and we are busy"))

   

<!-- begin snippet: js hide: false console: true babel: false -->

and I want to get the following table:

d <- data.frame(
    Description=c("Utilising this approach can helpa'x-ray", "How are you- doing", " We need to use interseting <U+0452> books to solve the issue ", " will do it tomorrow otherwise: next week ", " Things are hectic here and we are busy ", "I will work hard to complete it for nextr week1 or  tomorrow ", "The schools and Universities are closed in f -blook for a week",  " I am HAPPY with this situation now "," I will work hard to complete it for nextr week1 or  tomorrow"))
    f2<- read.table(text="B12 B6 B9
No Yes Yes
12 6 9
No No Yes
No No Yes
No No Yes
Yes No Yes
11 No Yes
12 11 P
No No Yes

", header=TRUE)

df3<-cbind(d,f2)

As you can see in the Description column, there are space and colon, and so on 1 after week is subscript and I was unable to fix it. I want to match it based on "Description". So I want to match df1 with df2 using Description. Can we do it it in R for this case?

1 Answers1

1

We can use stringdist joins from fuzzyjoin package to match data based on 'Description'. We use na.omit to remove the NA rows from the final dataframe.

na.omit(fuzzyjoin::stringdist_left_join(df1, df3, by = 'Description'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213