9

I have two strings that look the same but are not identical.

> t
[1] "2009_Manaus_Aerotáxi_crash"
> t2
[1] "2009_Manaus_Aerotáxi_crash"
> identical(t,t2)
[1] FALSE
> str(t)
 chr "2009_Manaus_Aerotaxi_crash""| __truncated__
> str(t2)
 chr "2009_Manaus_Aerotáxi_crash"

How can I force these two strings to be equal?

Thanks

ruthy_gg
  • 337
  • 3
  • 11
  • 2
    are they both character strings? check with `str()` – mtoto Feb 11 '16 at 12:30
  • 1
    It appears to be a data-type issue. Can you supply us with the output from `dput(list(t=t,t2=t2))`? – Heroka Feb 11 '16 at 12:36
  • > dput(list(t=t,t2=t2)) structure(list(t = "2009_Manaus_Aerotáxi_crash", t2 = "2009_Manaus_Aerotáxi_crash"), .Names = c("t", "t2")) – ruthy_gg Feb 11 '16 at 15:25
  • 1
    charToRaw(t) [1] 32 30 30 39 5f 4d 61 6e 61 75 73 5f 41 65 72 6f 74 61 cc 81 78 69 5f 63 72 [26] 61 73 68 charToRaw(t2) [1] 32 30 30 39 5f 4d 61 6e 61 75 73 5f 41 65 72 6f 74 c3 a1 78 69 5f 63 72 61 [26] 73 68 – ruthy_gg Feb 11 '16 at 15:51
  • 1
    See https://stackoverflow.com/questions/23699271/force-character-vector-encoding-from-unknown-to-utf-8-in-r, in particular `stri_trans_general(x, "Latin-ASCII") – Sam Firke Mar 01 '19 at 18:55
  • possibly its a duplicate of [this](https://stackoverflow.com/questions/20674577/how-to-compare-unicode-characters-that-look-alike) – PRAJIN PRAKASH Mar 18 '20 at 12:57

2 Answers2

4

If you write your data out to csv and then open the file in a program like Notepad++, then turn on view>all characters you will be able to see if there are something at the ends of your strings like LF or \r or \n. Then you will have a better idea what you need to remove and can use the above advice (stringi::str_cmp()) to test an example of a string that you know was not working to be sure you fixed it. For me the issue turned out to be spaces and this solved my problem:

own_dept_expect %>% 
    mutate(check_field = stringr::str_replace_all(check_field,"[:space:]"," ")) %>% 
    write_csv("C:/Users/me/Desktop/spaces_suck.csv")

I verified in Notepad++ that it was all uniform now.

Claudio Paladini
  • 1,000
  • 1
  • 10
  • 20
0

Consider using the stri_compare method from the stringi (https://cran.r-project.org/web/packages/stringi/) package. It returns 0 if two strings are equal or canonically equivalent. Check the documentation here.

In your case one would test it like that:

require('stringi')

t  = "2009_Manaus_Aerotáxi_crash"
t2 = "2009_Manaus_Aerotáxi_crash"
t3 = "1111_Manaus_Aerotáxi_crash"

ifelse( (stri_compare(t,t2) == 0), "Strings are equal", "Strings are different") 
ifelse( (stri_compare(t,t3) == 0), "Strings are equal", "Strings are different")

Hope this helps

selyunin
  • 1,530
  • 2
  • 23
  • 30
  • 1
    Thanks a lot selyunin, the thing is that I would like to convert one of the strings in the "identical" equivalent of the other. That is, alter t such that it becomes t2 identically. I need it because I have several strings like that that are not merged because they are not identical – ruthy_gg Feb 11 '16 at 15:27