1

I'm generating documentation using roxygen2 and Rdpack and, when using inline citations, see encoding errors when I build documentation with R CMD Rd2pdf MyPackage --no-clean (per Diagnosing R package build warning: "LaTeX errors when creating PDF version").

! Package textcomp Error: Symbol \textcurrency not provided by
(textcomp)                font family ptm in TS1 encoding.
(textcomp)                Default family used instead.

See the textcomp package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.1523 Wä
            gele et al. (2009);

These appear to arise because non-ASCII characters are being included in .Rd files via \insertCite{}. Can I use this mechanism to cite authors whose names include diacritics?


Here's a minimal documentation section:

#' Sample function
#'
#' Problematic citation to \insertCite{Wagele2009;textual}{MyPackage}
#'
#' @references
#' \insertAllCited{}
#' @encoding UTF-8
Foo <- function (x) x

inst/REFERENCES.bib contains (minimally)

@article{Wagele2009,
  author = {W{\"a}gele, J W and W{\"a}gele, H},
  year = {2009},
}

The DESCRIPTION file includes Encoding: UTF-8.

Martin Smith
  • 3,687
  • 1
  • 24
  • 51
  • The line 1523 error looks like it contains UTF-8 characters interpreted as Latin1. (I think the last two chars are the ä; not sure what the first two are.) So it looks as though there's been some kind of conversion back and forth between Latin1 and UTF-8. If you look at the .Rd file, does it contain the correct character, or is the error there already? – user2554330 Sep 14 '21 at 15:37
  • The .Rd file contains \insertCite{Wagele2009}. My suspicion is that Rdpack is replacing this with a Latin1 character at some point in the PDF construction process. – Martin Smith Sep 14 '21 at 18:39
  • Right, of course. I took a look at this. On my Mac where UTF-8 is native, things are fine. On Windows I see the problem you had. I also tried running the `Rdpack::insert_ref` function in R, and it returns the right thing. So it looks like an R bug, not an Rdpack bug: R receives a string properly encoded in UTF-8 and treats it as Latin1. I know the R core group is tired of these issues, and has a test version of R that works entirely in UTF-8, but I haven't tried your issue in it. – user2554330 Sep 14 '21 at 19:00

1 Answers1

2

After some further searching I've found an answer elsewhere that suggests that this is due to Windows' non-native handling of UTF-8 encoding, which [edit] was addressed in R4.0. is being addressed in a future release in the R 4.x series. Unless other readers have further suggestions, it looks like it may have to be a case of "wait a while"...

Martin Smith
  • 3,687
  • 1
  • 24
  • 51
  • 1
    Yes, I just tried the experimental build, and it seemed to work. If you want to try it, you can get it by following the instructions at https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html . – user2554330 Sep 14 '21 at 19:15
  • Thanks! It won't get me through CRAN's tests for now, but good to know that I will be able to come back and restore accents once this fix is released. – Martin Smith Sep 15 '21 at 05:50
  • I didn't try running R CMD check, I just saw that accents looked fine in a sample help page. But hopefully CRAN will relax its tests once every platform can run in UTF-8. – user2554330 Sep 15 '21 at 08:31