I'm just having a pain with R (on OS X).
I have a set of german named files. And have the strange behavior that I do this example (the first 'Käse' was inputted from keyboard - the second copied from ls.files() output):
names <- c('Käse', 'Käse')
grepl('Käse', names)
# [1] TRUE FALSE
After a lot of brain bashing I noticed in the console that the Umlauts were displayed slightly different.
Finally I found that:
iconv(names,'latin1','ascii','bytes')
# [1] "K<c3><a4>se" "Ka<cc><88>se"
Which was especially surprising, as the letter ä is part of the ASCII characters with code 132.
I also notice that when I input (input from keyboard)
system('touch käse2')
it is automatically converted to the second encoding.
So my question is - how can I configure R that the umlauts I type in regular expressions will match those that are used in file names?
The output of Sys.getlocale:
> Sys.getlocale()
[1] "de_AT.UTF-8/de_AT.UTF-8/de_AT.UTF-8/C/de_AT.UTF-8/de_AT.UTF-8"
Update
The behavior that bothers me the most is following:
filename <- 'Käse.Rdata'
save(file=filename)
list.files(pattern=filename)
# character(0)
so the filename is not equal to the string that was used to create it.
Hmm - this seems Mac specific - on my windows machine it works as expected.