1

I am stuck when trying to check equivalency of elements in a vector of strings

> head(vector_of_strings)
[1] "    " "SC  " " 2.6" "WT  " " 1.0" "WT  "
> is.character(vector_of_strings)
[1] TRUE
> "SC  " %in% vector_of_strings
[1] FALSE
> "WT  " %in% vector_of_strings
[1] FALSE

The vector seems to have elements that are equivalent, but reports otherwise. Should I modify the data type or check for equivalency in another way?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
  • 1
    Can you give `dput(vector_of_strings)`? – Axeman Sep 30 '18 at 13:18
  • `> dput(vector_of_strings) c(" ", "SC ", " 2.6", "WT ", " 1.0", "WT ", " ", "SC ", " 0.6", "WT ", " ", " 1 @", " ", " 1 @", " ", " ", " ", " ", " ", " ", " 3 @", " ", " 3 @", " ", " ", "SC ", " ", " 0.2", "WT ", " 0.4", "WT ", " ", " 0.0", "WT ", " ", " ", " ", " ", " ", " ", " ", "SC ", " ", " ", " ", " ", " ", "0")` @Axeman – Nicholas Frank Sep 30 '18 at 13:20
  • a typo: you have an additional "space" in your check of "SC " and "WT " – KoenV Sep 30 '18 at 13:28
  • the dput comment is not showing on stackoverflow correct. There was a copy/paste error. It exists with two spaces even with dput. Additionally I tried `> "SC " %in% vector_of_strings` with a single space as @KoenV mentioned and it doesn't work. – Nicholas Frank Sep 30 '18 at 13:39
  • Use `trimws` to remove leading and trailing whitespaces to avoid such issues, try `"SC" %in% trimws(vector_of_strings)` – Ronak Shah Sep 30 '18 at 13:58
  • Hi @RonakShah. Thank you for the feedback. However, it doesn't seem to work. `"SC" %in% trimws(vector_of_strings) [1] FALSE` – Nicholas Frank Sep 30 '18 at 14:06
  • works for me from the `dput` which you shared. `vector_of_strings <- dput(vector_of_strings) c(" ", "SC ", " 2.6", "WT ", " 1.0", "WT ", " ", "SC ", " 0.6", "WT ", " ", " 1 @", " ", " 1 @", " ", " ", " ", " ", " ", " ", " 3 @", " ", " 3 @", " ", " ", "SC ", " ", " 0.2", "WT ", " 0.4", "WT ", " ", " 0.0", "WT ", " ", " ", " ", " ", " ", " ", " ", "SC ", " ", " ", " ", " ", " ", "0"); "SC" %in% trimws(vector_of_strings) [1] TRUE` – Ronak Shah Sep 30 '18 at 14:08
  • For whatever reason, the vector is not modified on my screen after trying trimws. See code here `> head(trimws(vector_of_strings)) [1] " " "SC " " 2.6" "WT " " 1.0" "WT "` – Nicholas Frank Sep 30 '18 at 14:10
  • 1
    There may exist some unicode "invisible" characters that `trimws` is not able to remove (because they're not whitespace). You may want to have a look at `?Encoding`, `stringi::stri_unescape_unicode` and `stringi::stri_escape_unicode` – PavoDive Sep 30 '18 at 14:21
  • Thanks @PavoDive. I think you're correct. This is data scraped from a webpage and Encoding reveals the strings are in UTF-8 format. I will explore how to deal with this format. Any help would be appreciated! – Nicholas Frank Sep 30 '18 at 14:30
  • 1
    use `stringi::stri_unescape_unicode` to check the character numbers (they should come like `\U03B1` or `` and then `gsub` the whole vector to get rid of them. Have a look to https://stackoverflow.com/questions/36108790/trouble-with-strings-with-u0092-unicode-characters – PavoDive Sep 30 '18 at 14:33
  • 1
    you can use `repair_encoding` too: https://stackoverflow.com/questions/37867465/r-rvest-encoding-errors-with-utf-8 – PavoDive Sep 30 '18 at 14:37

0 Answers0