3

I am looking for an easy way to replace all punctuated letters with normal letters. For example, I want to change föó to foo. I can do this as such:

gsub("ö|ó","o","föó")

however, it will probably be a lot of manual work to do this for every possible punctuated letter. Is there a way to do this automatically?

Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131

1 Answers1

9

You can try some variation of this:

    cleanString <- function(x){
        tmp <- iconv(x, from="UTF8", to ="ASCII//TRANSLIT")
        gsub("[^[:alpha:]]", "", tmp)
        }

x = "föó"

cleanString(x)

[1] "foo"

idea for using iconv from Remove diacritics from a string

Community
  • 1
  • 1
Greg
  • 11,564
  • 5
  • 41
  • 27
  • it should take a character vector (i.e. `x = c("föó", "zöó")`) as well. – Greg May 17 '11 at 17:20
  • Thanks, looks awesome. I get NA when I try it with `ü` though: `cleanString('ü')` – Sacha Epskamp May 17 '11 at 18:21
  • @Sacha - it works for me. I have a feeling it is going to be system-dependent, unfortunately. – Greg May 17 '11 at 18:42
  • 1
    @Sacha it seems to work for me on Windows if I change the from argument to `"latin1"` or just leave as `""` (system default). – James May 18 '11 at 09:37