Removing the special symbols in data.frame column values

Question

I have two data frame each with a column Name

df1:

name  
@one2  
!iftwo  
there_2_go  
come&go

df1 = structure(list(name = c("@one2", "!iftwo", "there_2_go", "come&go")),.Names = c("name"), row.names = c(NA, -4L), class = "data.frame")

df2:

name  
One2  
IfTwo#  
there-2-go  
come.go



df2 = structure(list(name = c("One2", "IfTwo#", "there-2-go", "come.go")),.Names = c("name"), row.names = c(NA, -4L), class = "data.frame")

Now to compare the two data frames for inequality is cumbersome because of special symbols using %in%. To remove the special symbols using stringR can be useful. But how exactly we can use stringR functions with %in% and display the mismatch between them

have already done the mutate() to convert all in lowercasestoLower()as follows

df1<-mutate(df1,name=tolower(df1$name))
df2<-mutate(df2,name=tolower(df2$name))

Current output of comparison:

df2[!(df2 %in% df1),]
[1] "one2"       "iftwo#"     "there-2-go" "come.go"

Expected output as essentially the contents are same but with special symbols:

 df2[!(df2 %in% df1),]
 character(0)

Question : How do we ignore the symbols in the contents of the Frame

did you mistakenly attach `df` twice instead of df and df2? Also what is your expected output? — Sotos, May 05 '17 at 12:03
Oh that is a mistake while copying from R console to stack over flow — anmonu, May 05 '17 at 12:04
Possible duplicate of http://stackoverflow.com/questions/2231993/merging-two-data-frames-using-fuzzy-approximate-string-matching-in-r — zx8754, May 05 '17 at 12:11

score 2 · Accepted Answer · answered May 05 '17 at 12:48

2

Here it is in a function,

f1 <- function(df1, df2){
  i1 <- tolower(gsub('[[:punct:]]', '', df1$name))
  i2 <- tolower(gsub('[[:punct:]]', '', df2$name))
  d1 <- sapply(i1, function(i) grepl(paste(i2, collapse = '|'), i))
  return(!d1)
}

f1(df, df2)
#    one2    iftwo there2go   comego 
#   FALSE    FALSE    FALSE    FALSE 

#or use it for indexing,

df2[f1(df, df2),]
#character(0)

answered May 05 '17 at 12:48

Sotos

51,121
6
32
66

this one is great @Sotos, almost my requirement, but how about special symbols if any like below in the frames are present: 10��NORTH, it will throw error right tolower(gsub('[[:punct:]]', ..... – anmonu May 05 '17 at 12:52
1

Maybe [this](http://stackoverflow.com/questions/35639317/r-how-to-remove-very-special-characters-in-strings) can help...? I can not test as you have not provided data for those cases – Sotos May 05 '17 at 12:55
even I dont have such data :) was curious. this gsub is completely a new thing for me today, Thanks. will explore more ! – anmonu May 05 '17 at 12:57

Removing the special symbols in data.frame column values

1 Answers1