2

I actually import data (Names) of UTF-8 format from *.csv and would like to imprint the name later into the xtable and create document with Knitr, via *.Rmd > *.md > pdf.

Example:

data <- paste("úáźýžč",sep="") 

Now within Rmd file I do:

```{r}
print(data)
```

I get error:

pandoc.exe: Cannot decode byte '\xfa': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

How to approach this? Thanks

EDIT: sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] grDevices datasets  grid      splines   graphics  utils     stats     methods   base     

other attached packages:
[1] rmarkdown_0.3.3     gdata_2.13.3        xlsx_0.5.7          xlsxjars_0.6.1      rJava_0.9-6          
    dplyr_0.3.0.2       XML_3.98-1.1        sp_1.0-15          
[9] RCurl_1.95-4.3      bitops_1.0-6        xts_0.9-7           msm_1.4             tseries_0.10-       
    32     knitr_1.7           gridExtra_0.9.1     plyr_1.8.1            
[17] xtable_1.7-4        texreg_1.33         latticeExtra_0.6-26 RColorBrewer_1.0-5   
     lattice_0.20-29     forecast_5.6        timeDate_3010.98    zoo_1.7-11         
[25] coxrobust_1.0       survival_2.37-7     stringr_0.6.2       data.table_1.9.4    
     markdown_0.7.4     

     loaded via a namespace (and not attached):
 [1] assertthat_0.1   chron_2.3-45     colorspace_1.2-4 DBI_0.3.1        digest_0.6.4   
     evaluate_0.5.5   expm_0.99-1.1    formatR_1.0      fracdiff_1.4-2  
 [10] gtools_3.4.1     htmltools_0.2.6  magrittr_1.0.1   Matrix_1.1-4     mvtnorm_1.0-0      
      nnet_7.3-8       parallel_3.1.1   quadprog_1.5-5   Rcpp_0.11.3     
 [19] reshape2_1.4     tools_3.1.1 
Maximilian
  • 4,177
  • 7
  • 46
  • 85
  • What OS and R version are you running? What does `Encoding(data)` return? – MrFlick Oct 21 '14 at 18:07
  • Good point: Just checked and Encoding(data) give "unknown". I'm running the newest version of R on Windows 7. But still, solution to the example above would give me hint. I cannot solve the above either. Thanks – Maximilian Oct 21 '14 at 18:13
  • When i copy/paste the example into Windows 7, i see the encoding set to "latin1" (which is what I expect on Windows). Are you sure the `csv` is encoded using UTF-8 and not latin1? How did you import it? Either way, it's odd that if you explicitly set the encoding during import and somehow it would revert to "unknown". – MrFlick Oct 21 '14 at 18:26
  • 1
    Please `update.packages(ask=FALSE)` if you have not done so. Then please include `library(rmarkdown);library(knitr);sessionInfo()`, as well as a minimal, complete, and reproducible example. – Yihui Xie Oct 21 '14 at 18:29
  • @MrFlick; I imported the csv as read.csv("file.csv", header=TRUE, fill=TRUE, encoding="UTF8"). I have tried also importing without explicitly stating encoding. I tried Encoding(org.data$Name) <- "UTF-8" and it gives me: "Dru\u009estvo" and should be Družstvo. Not sure if this helps. – Maximilian Oct 21 '14 at 18:41
  • It just sounds like you have the wrong encoding. I don't think your file is UTF-8. If you want any more help, please create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) otherwise we're just guessing. – MrFlick Oct 21 '14 at 18:51
  • Sure, but this would require disclosing the data itself, which would not be a problem but the current framework SO doesn't support data upload. Solution however to the example above would help. Is the example working fine with your setup? – Maximilian Oct 21 '14 at 18:54
  • I have suggestions for using Unicode in R in this article: http://shiny.rstudio.com/articles/unicode.html Even though your question is not about shiny, I believe you will still find the article useful. – Yihui Xie Oct 23 '14 at 06:47
  • Thanks, that seems very helpful indeed! I solved the issue by importing the UTF-8 characters first into grid.table etc. I read the materials you referring to and possibly post solution or delete altogether. Thanks – Maximilian Oct 23 '14 at 07:56
  • 1
    @Max Please post a solution instead of deleting the post to benefit other people. – Yihui Xie Oct 24 '14 at 02:49

0 Answers0