2

I am trying to detect strings containing special characters like ä, ü, ö and ß

I have a list of allowed characters, and I am using it like this, to detect any string that contains anything else but these:

 grepl("[^0-9a-zA-Z$%^*&]","aaüh")

However, this returns FALSE. So it fails to detect the special ü.

How can I make explicit that only latin characters are allowed?

tzema
  • 451
  • 3
  • 11
  • 1
    It returns `TRUE` for me. This might be dependent on what locale you're working in though. I.e. - `a-z` might capture `ü` in a German locale. – thelatemail Aug 18 '22 at 03:16
  • 1
    Relevant info: https://stackoverflow.com/questions/19765610/when-does-locale-affect-rs-regular-expressions – thelatemail Aug 18 '22 at 03:37

1 Answers1

2

You have to convert the string first. I used the base R function iconv to encode the string. The iconv function will create "aa<U+00FC>h" in this example.

gimme <- function(val) {iconv(val, from = "UTF-8", "ASCII", "Unicode")}

grepl("[^0-9a-zA-Z$%^*&]", gimme("aaüh"))
# [1] TRUE 
Kat
  • 15,669
  • 3
  • 18
  • 51
  • Thank you! I tried it and still gives me FALSE... Now I changed the locale and language as some other players suggested `Sys.setlocale("LC_ALL","English"); Sys.setenv(LANG = "en_US.UTF-8")` and it works without iconv. – tzema Aug 18 '22 at 06:59