In R, I have vectors like this:
TEST <- c("BlAA¶schl, G", "ThAA¶ni, A.")
whereby BlAA¶schl
schould be Blöschl
, and ThAA¶ni
should be Thöni
.
There are similar problems throughout a whole dataset. I don't know how it is termed (maybe "non-ASCII characters"?).
Based on this response, others seem to have tried this code successfully:
Encoding(TEST) <- 'latin1'
stringi::stri_trans_general(TEST, 'Latin-ASCII')
But in my case, nothing changes.
What can I do to convert characters like AA¶
to ö
?
EDIT: The key problem, it seems, is that there is a "double mojibake" as JosefZ mentioned in the comments.
EDIT 2: I found this "UTF-8 Character Debug Tool" which contains some (not all) of the problems in the actual
and expected
columns. In addition, this "encoding repairer" on GitHub seems to offer what I need, but it is not written in R.