0

I know the title is weird but I have not better clue to summarise the question. Essenrtially, I have a list of characters in R. However, when I use %in% to check if the characters from the list are in the list, some returned TRUE while others returned FALSE.

#to simplify I only show two:
my_list = c("Chapman University System", "State University of New York (SUNY) System")
"Chapman University System" %in% my_list #TRUE
"State University of New York (SUNY) System" %in% my_list #FALSE

Both the character I use for checking and the list are chacters (correct data type) and when I read the list I use encoding = UTF-8.

Any suggestion why this occur or at least how can I trouble-shoot why this happen?

  • Currently not reproducible. If you can make the problem reproducible you'll surely get a solution. – s_baldur Oct 26 '22 at 08:58
  • 3
    I also cannot reproduce this (i.e. I get `TRUE` to both). Out of interest, what is the output of `utf8ToInt("State University of New York (SUNY) System")[!utf8ToInt(my_list[2]) == utf8ToInt("State University of New York (SUNY) System")]`? It should be `integer(0)`. – SamR Oct 26 '22 at 08:59
  • thanks for your reply @sindri_baldur! my list is a csv file - when I read it in R, I copy and paste the characters in the view page and use %in% to check, and I got different outcomes for different characters..how should I make that reproducible? is there any way I can upload a file here? sorry I am first-time user here – yalepresident Oct 26 '22 at 09:02
  • @SamR the result is integer(0) – yalepresident Oct 26 '22 at 09:03
  • 1
    Strings can contain numerous special characters (such as the non-breaking space). You could try [coercing to ASCII](https://rdrr.io/rforge/stringi/man/stri_enc_toascii.html) before the comparison. – Roland Oct 26 '22 at 09:05
  • @yalepresident This means all the characters have the same UTF-8 encoding so it's not an encoding issue. In fact it suggests that the strings are exactly the same. Please read [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). If you copy/paste a sample of your data with `dput(head(df))` then others will be able to see if they can replicate the issue. – SamR Oct 26 '22 at 09:06
  • If I run `"State University of New York (SUNY) System" %in% my_list #FALSE` I get `TRUE` – TarJae Oct 26 '22 at 09:45

0 Answers0