1

I am returning to some old code. I am sure it worked in the past. Since I last used it, I've upgraded dplyr to version 1.0.0. Unfortunately, devtools::install_version("dplyr", version='0.8.5") gives me an error whilst compliling, so I can't perform a regression test.

I am trying to create a tidy version of the mcmc class from the runjags package. The mcmc class is essentially a (large) two-dimensional matrix of arbitrary size. It's likely to have several (tens of) thousands of rows and the column names are relevant and (as in my toy data below) potentially awkward. There is also useful information in the attributes of the mcmc object. Hence the somewhat convoluted approach I've taken. It needs to be completely generic.

* Toy data *

# load("./data/oCRMPosteriorShort.rda")
# x <- head(oCRMPosteriorShort$mcmc[[1]])
# dput(x)
x <- structure(c(7.27091686833247, 5.72764789439587, 5.72103479848012, 
            7.43825337823404, 8.59970106873194, 8.03081445451, 9.16248677241767, 
            3.09793571064081, 4.66492638321819, 3.19480526258532, 5.1159808007229, 
            6.08361682213139, 5.05973067601884, 4.14556598358942, 0.95900563867179, 
            0.88584483221691, 0.950304627720881, 1.13467524314569, 1.44963882689823, 
            1.19907577185321, 1.15968445234753), .Dim = c(7L, 3L), .Dimnames = list(
              c("5001", "5003", "5005", "5007", "5009", "5011", "5013"), 
              c("alpha[1]", "alpha[2]", "beta")), mcpar = c(5001, 5013, 
                                                            2), class = "mcmc")

* Stage 1: Code that works: *

a <- attributes(x)
colNames <- a$dimnames[[2]]
#Get the sample IDs (from the attributes of x) and add the chain index 
base <- tibble::enframe(a$dimnames[[1]], value="Sample") %>%
          tibble::add_column(Chain=1, .before=1) %>%
          dplyr::select(-.data$name)
# Create a list of tibbles, defining each as the contents of base plus the contents of the ith column of x, 
# plus the name of the ith column in Temp.
t <- lapply(1:length(colNames), function(i) d <- cbind(base) %>% tibble(Temp=colNames[i], Value=x[,colNames[i]]))

At this point, I have a list of tibbles. Each tibble contains columns named Chain (with the value of 1 for each observation in each tibble in this case), Sample (with values taken from the first dimension of the dimnames attribute of x, Temp (with values of beta, alpha[1] and alpha[2] in elements 3, 1 and 2 of the list) and Value (the value of the mcmc object in cell [Sample, Temp].

* Stage 2: Here's the problem... *

It should be a simple matter to row_bind the list into a single tidy tibble containing (some of) the information I need. But:

# row_bind the list of tibbles into a single object
rv <- dplyr::bind_rows(t)
# Error: `vec_ptype2.double.double()` is implemented at C level.
# This R function is purely indicative and should never be called.
# Run `rlang::last_error()` to see where the error occurred.

* Questions *

I can't see what I'm doing wrong here. (And even if I were doing something wrong, I'd expect a more user-friendly, higher level sort of error message.) I can't find any references to this error anywhere on the web.

  1. Does anyone have any idea what's going on?
  2. Would someone run the code using dplyr v0.8.x and report what they see?

I'd appreciate your thoughts.

* Update *

It looked as if the problem has been resolved by a reboot, but has now returned. Even when these tibbles cause the error, a related example from the online doc works:

one <- starwars[1:4, ]
two <- starwars[9:12, ]
bind_rows(list(one, two))

runs without problems.

Context:

> # Context
> R.Version()$version.string
[1] "R version 3.6.3 (2020-02-29)"
> packageVersion("dplyr")
[1] ‘1.0.0’
> Sys.info()["version"]
                                                                                            version 
"Darwin Kernel Version 18.7.0: Mon Apr 27 20:09:39 PDT 2020; root:xnu-4903.278.35~1/RELEASE_X86_64" 
Limey
  • 10,234
  • 2
  • 12
  • 32
  • With `dplyr` 0.8.3 I get: `Error: Argument 1 can't be a list containing data frames`. When trying out `do.call("rbind", t)` I get the following error message: `Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’` – starja Jun 07 '20 at 12:55
  • The `Value` column is of class `mcmc` and presumably some method for this class is tripping up `bind_rows()`. So perhaps try `purrr::map_df(t, modify_at, "Value", unclass)`. – Ritchie Sacramento Jun 07 '20 at 15:24
  • @27ϕ9 now it's my turn to be unable to replicate: `class(t[[1]]$Value)` and `typeof(t[[1]]$Value)` give me `"numeric"` and `"double"` respectively. So I don't think that can be the problem... `Value` is a column from within the `mcmc` object, not an `mcmc` object itself. – Limey Jun 07 '20 at 15:34
  • @Limey - it is I think. The `runjags` package needs to be loaded before running your code for the issue to be replicated. It has a method for `[` which results in `Value=x[,colNames[i]]` remaining class `mcmc`. If it's not loaded then the class isn't kept. – Ritchie Sacramento Jun 07 '20 at 15:37
  • @27ϕ9: Ah-ha! So it does. Thank you. Which suggests that `t <- lapply(1:length(colNames), function(i) d <- cbind(base) %>% tibble(Temp=colNames[i], Value=as.double(x[,colNames[i]])))` is a simple fix. (Simpler than anything `purr`-related, at least. Does that work for you? – Limey Jun 07 '20 at 15:41
  • @27ϕ9. Right-oh. If you post an answer, I will accept. Thanks again. – Limey Jun 07 '20 at 15:45

0 Answers0