0

I have encountered a very strange memory behavior in R when accessing slots from a custom S4-class where I have applied row names to the object.

Imagine I have the S4 class below with two identical objects. The only difference is that for one of the objects I have applied row names via the function rownames():

#' TestClass
#' 
#' @slot matrixWithRowNames A matrix where rownames() is applied.
#' @slot matrixWithNoRowNames A matrix where rownames() has not been applied
#' @export TestClass

TestClass <- setClass(
  "TestClass",
  slots = list(
    matrixWithRowNames                   = "matrix",
    matrixWithNoRowNames                 = "matrix"
  )
)

#' initialize
#' @name init_TestClass
#' @docType methods
#' @export

setMethod("initialize", signature("TestClass"),
          function(.Object) {
            nrow=1000
            ncol=10000
            .Object@matrixWithNoRowNames<-matrix(runif(nrow*ncol),nrow)
            .Object@matrixWithRowNames<-.Object@matrixWithNoRowNames

            rownames(.Object@matrixWithRowNames)<-c(1:nrow)

            return(.Object)
          })

When I measure the memory-usage there is a huge difference the first type I operate on either slot. In the example below I initialize the class and then I calculate the column sums of each object. I use peakRAM to monitor the memory consumption.

testObject<-TestClass()
peakRAM({
  colSums_1<-colSums(testObject@matrixWithRowNames)
})
peakRAM({
  colSums_2<-colSums(testObject@matrixWithNoRowNames)
})

The output of peakRAM is show below. When operating on the object with row names the peak ram used is 76.4MB equal to the size of the object. When operating on the object without row names there is almost no RAM usage as expected.

enter image description here

enter image description here

This memory consumption only happens the first time I access a slot with row names.

Does anybody have a answer to why R behave this way? And is there a way to use row names without this issue? In my actual code I have some very large objects that can result in the code crashing because of this behavior.

STHOH
  • 261
  • 1
  • 12
  • Assigning row names is causing a copy of the entire object to be made - see [this](https://stackoverflow.com/questions/74029805/why-does-adding-attributes-to-a-dataframe-take-longer-with-large-dataframes/74030156) related question. – SamR Aug 04 '23 at 08:33
  • The copying first seems to take place when I evaluate the object. But on assigment with `rownames()` they may just be lazy initialized? And then truely assigned when i access the object the first time? – STHOH Aug 04 '23 at 08:58
  • 1
    I don't have time to look into it fully but have you tried changing `.Object@matrixWithNoRowNames<-matrix(runif(nrow*ncol),nrow)` to `.Object@matrixWithNoRowNames<-matrix(runif(nrow*ncol),nrow, dimnames=list(1:nrow))`? And then removing the now redundant `rownames(.Object@matrixWithRowNames)<-c(1:nrow)` line? Suspect if there's already a memory allocation for the row names attribute when the matrix is created the copy may never occur. – SamR Aug 04 '23 at 09:26
  • 1
    That seems to be working! At least it works in the simple example above. Thank you so much for the suggestion. – STHOH Aug 04 '23 at 09:33
  • But then again, this only works if I can assign the row names at the time of creating the matrix. In my code I have to assign the row names after creating the matrix. The code shown in the example is just to recreate the issue and not my actual code. – STHOH Aug 04 '23 at 09:39
  • 1
    Luckily it also seems to work if I just assign the names by using `dimnames()` as for example: `dimnames(.Object@matrixWithRowNames)<-list(1:nrow)` – STHOH Aug 04 '23 at 09:45
  • I haven't tested but suspect if you create any row names when you create the matrix, even just a vector of `NA` it would probably not copy the object if you modify them later. However given you're creating an S4 object and row names can be idiosyncratic in various other ways (such as becoming a character vector), why not just create a slot which is a vector of the length of `nrow(matrixWithNoRowNames)` to store the info you need? – SamR Aug 04 '23 at 14:07

0 Answers0