1

I'm getting trouble with encoding writing content to file. It's a very simple case, and it's getting my head around for a few days.

I've an R file named teste.R with the following code:

teste <- function(fileContent, fileName) {
  fileConn<-file(fileName)
  writeLines(fileContent, fileConn)
  close(fileConn)   
}

page <- ''
page <- paste(page, 'ãõéç')

teste(page, 'file_.html')

Well, I run this piece of code on R-Studio in two different ways:

1 - I simply run the teste function on the console.

2 - I run the following command on the console:

source('F:/Dropbox/TESE/Projeto/AnaliseSAA/teste.R')

In both cases I get the same content on the file: ãõéç And so far everything is ok. But, if I open the files on a Browser, I get two different outputs:

1- ãõéç

2- ãõéç

All of this because I'm creating a HTML page to display my analyses made in R, and the pages looks terrible with this wrong encoding.

Thanks.

Bruno Ferreira
  • 466
  • 7
  • 17

1 Answers1

1

If you wish to create an HTML file, you'd better do it with an existing tool such as RStudio's htmltools package, or use the markdown package. Because if you don't include the standard tags including the one declaring the encoding, you're in for a lot of pain.

Here's a suggestion that could work for you:

library(htmltools)

teste <- function(fileContent, fileName) {
  content <- tags$html(HTML('<head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"></head>'),
                       tags$body(fileContent))
  outfile <- file(description = fileName, open = "w", encoding = "UTF-8")
  capture.output(content, file = outfile)
  close(outfile)
}

page <- ''
page <- paste(page, 'ãõéç')

teste(page, 'file_.html')

EDIT

Another approach would be to replace accentuated characters with a function like this (there might be a preexisting one in a package but I'm not aware of one):

repl.accent <- function(x) {
    accent <- c("À", "à", "Â", "â", "Ç", "ç", "È", "è", "É", "é", "Ê", "ê", "Ë", "ë",
                "Î", "î", "Ï", "ï", "Ñ", "ñ", "Ô", "ô", "Ö", "ö", "Ù", "ù", "Û", "û", 
                "Ü", "ü","'")
   repl  <- c("&#192;", "&#224;", "&#194;", "&#226;", "&#199;", "&#231;", "&#200;",
              "&#232;", "&#201;", "&#233;", "&#202;", "&#234;", "&#203;", "&#235;",
              "&#206;", "&#238;", "&#207;", "&#239;", "&#209;", "&#241;", "&#212;",
              "&#244;", "&#214;", "&#246;", "&#217;", "&#249;", "&#219;", "&#251;",
              "&#220;", "&#252;", "&#39;")
   stringi::stri_replace_all_fixed(str = x, pattern = accent, replacement = repl, vectorize_all = FALSE)
}

repl.accent('éçà')
[1] "&#233;&#231;&#224;"
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
  • Ok, thank you for your advice. I'll explore R htmltools library for now on. But unfortunately the issue remains the same. I get the same exactly the content in both files, but the second one remains with the wrong encoding. It’s very strange why this is happening, because both files e a text editor are exactly the same, but seems otherwise on a browser. – Bruno Ferreira Mar 27 '15 at 22:35
  • See my edit for another approach.. you'll need to add the proper codes for ã and õ though. – Dominic Comtois Mar 28 '15 at 10:27
  • I forgot to mention that I tried UTF-8 but it doesn’t work ether. Fortunately your second suggestion worked, i just need to add a few extra characters. Thank you very much sr. – Bruno Ferreira Mar 28 '15 at 14:14
  • You're welcome! I modified a bit the first function to ensure that the output file is written with utf8 encoding (and also used the html4 way of declaring the encoding since no "doctype" is declared), so that solution normally _should_ work as well now... if you do test it pls let me know if it does ok? – Dominic Comtois Mar 29 '15 at 05:37
  • I tried and yes it has worked!! I'll use that approach know instead of the previous one. Thank you once again. – Bruno Ferreira Mar 29 '15 at 20:25