1

I am successfully modifying multiple Excel files using library(RDCOMClient). However, setting a cell value to a non-ascii string results in å becoming Ã¥ etc. I also cannot pass an UTF-8 filename to Excel's Open() and Save() methods. Hopefully there is a single solution to both problems.

Here's a simple reproducible example using Save(): Creating an empty workbook and trying to save it as å.xlsx results in Ã¥.xlsx. The same operation works fine for a.xlsx.

# install.packages("RDCOMServer", repos = "http://www.omegahat.net/R")
library(RDCOMClient)

xlApp <- COMCreate("Excel.Application")
wb <- xlApp$Workbooks()$Add()
path1 <- normalizePath("a.xlsx", mustWork = FALSE)
path2 <- normalizePath("å.xlsx", mustWork = FALSE)
wb$SaveAs(path1)
#> [1] TRUE
wb$SaveAs(path2)
#> [1] TRUE
wb$Close()
#> [1] TRUE
xlApp$Quit()
#> NULL
dir(pattern = "xlsx")
#> [1] "a.xlsx"  "Ã¥.xlsx"

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       Europe/Berlin               
#>  date     2020-05-10                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports     1.1.6   2020-04-05 [1] CRAN (R 4.0.0)
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 4.0.0)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools      2.3.0   2020-04-10 [1] CRAN (R 4.0.0)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 4.0.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  fs            1.4.1   2020-04-04 [1] CRAN (R 4.0.0)
#>  glue          1.4.0   2020-04-03 [1] CRAN (R 4.0.0)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 4.0.0)
#>  knitr         1.28    2020-02-06 [1] CRAN (R 4.0.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  pkgbuild      1.0.7   2020-04-25 [1] CRAN (R 4.0.0)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 4.0.0)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx      3.4.2   2020-02-09 [1] CRAN (R 4.0.0)
#>  ps            1.3.2   2020-02-13 [1] CRAN (R 4.0.0)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
#>  Rcpp          1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#>  RDCOMClient * 0.94-0  2020-04-13 [1] local         
#>  remotes       2.1.1   2020-02-15 [1] CRAN (R 4.0.0)
#>  rlang         0.4.5   2020-03-01 [1] CRAN (R 4.0.0)
#>  rmarkdown     2.1     2020-01-20 [1] CRAN (R 4.0.0)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 4.0.0)
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 4.0.0)
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun          0.13    2020-04-13 [1] CRAN (R 4.0.0)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] C:/R-library
#> [2] C:/Program Files/R/R-4.0.0/library

Created on 2020-05-10 by the reprex package (v0.3.0)

Neither Encoding(path2) <- "UTF-8" nor Sys.setlocale(category = "LC_ALL", locale = "en_US.UTF-8") seem to have any effect. ("OS reports request to set locale to "en_US.UTF-8" cannot be honored".)

I couldn't figure out how Excel COM handles Unicode from the docs. This Visual Basic-related question may perhaps be relevant, but I can't figure out how to apply it to my issue.

EDIT: I worked around the filename issue by opening/saving a temporary copy, sheetname issues by referring to them by number instead of name, and cell values by rephrasing things. Still interested in a proper solution 8-)

Jon Olav Vik
  • 1,421
  • 2
  • 12
  • 20
  • I don't know R, neither how it's storing the script you show (ie: file encoding of the source code), but you must pass Unicode-encoded string to Excel methods (not UTF8, not ansi) – Simon Mourier May 10 '20 at 16:55
  • Thanks. Unfortunately, the backslash in the Unicode escape gets interpreted as the root folder; e.g. `wb$SaveAs(stringi::stri_escape_unicode("å.xlsx"))` gives `Microsoft Excel cannot access the file 'c:\E112B100'`. – Jon Olav Vik May 10 '20 at 20:27

2 Answers2

0

stringi::stri_enc_tonative() was what I needed.

I had UTF-8 strings text and sheets returned by readxl::read_excel() and readxl::excel_sheets(), so that Encoding(text) was "UTF-8" whereas Excel evidently requires "latin1" on my system. Replacing text with stringi::stri_enc_tonative(text) solved all my issues: filenames for xlApp$Open(), sheetnames for wb$Open(), and values for rng[["Value"]] <-.

Jon Olav Vik
  • 1,421
  • 2
  • 12
  • 20
-1

One can use stringi::stri_enc_toutf8() to convert to UTF-8.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • Welcome to Stack Overflow! *"I don't have enough reputation to comment"* **does not mean that you should use an Answer instead**. It means that you need to learn how to use the site, and when you are sufficiently experienced, you will be granted the [privilege](/help/privileges) of commenting. Until then, please [provide answers that don't require clarification from the asker](//meta.stackexchange.com/q/214173). – Toby Speight Mar 03 '21 at 08:37