I am retrieving online XML data using the XML
R packages. My issue is that the UTF-8 encoding is lost during the call to xmlToList
: for instance, 'é' are replaced by 'é'. This happens during the XML parsing.
Here is a code snippet, with an example of encoding lost and another where encoding is kept (depending of the data source) :
library(XML)
library(RCurl)
url = "http://www.bdm.insee.fr/series/sdmx/data/DEFAILLANCES-ENT-FR-ACT/M.AZ+BE.BRUT+CVS-CJO?lastNObservations=2"
res <- getURL(url)
xmlToList(res)
# encoding lost
url2 = "http://www.bdm.insee.fr/series/sdmx/conceptscheme/"
res2 <- getURL(url2)
xmlToList(res2)
# encoding kept
Why the behaviour about encoding is different ? I tried to set .encoding = "UTF-8"
in getURL
, and to enc2utf8(res)
but that makes no change.
Any help is welcome !
Thanks,
Jérémy
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RCurl_1.95-4.7 bitops_1.0-6 XML_3.98-1.3
loaded via a namespace (and not attached):
[1] tools_3.2.1