TLDR: How do I sort objects without having to introduce a new S3 class globally?
In R we need to introduce S3 classes to sort custom objects (see, i.e., this answer). Here is an example where I sort a list of strings based on their length.
`[.customSort` <- function(x, i, ...) structure(unclass(x)[i], class = "customSort")
`==.customSort` <- function(a, b) nchar(a[[1]]) == nchar(b[[1]])
`>.customSort` <- function(a, b) nchar(a[[1]]) > nchar(b[[1]])
customObject <- structure(list('abc', 'de', 'fghi'), class = 'customSort')
unlist(sort(customObject))
# [1] "de" "abc" "fghi"
In my R package, I want to offer a sort function mySort(..., compare)
. But, instead of going through the create-an-S3-class ordeal, the user should be able to supply a compare function (this is similar to implementations in Python, cpp, Java, Go, etc.)
# Try 1
mySort <- function(someList, compare) {
`[.tmpclass` <- function(x, i, ...) structure(unclass(x)[i], class = 'tmpclass')
`==.tmpclass` <- function(a, b) compare(a[[1]],b[[1]]) == 0
`>.tmpclass` <- function(a, b) compare(a[[1]],b[[1]]) > 0
class(someList) <- 'tmpclass'
sort(someList)
}
# Try 2
mySort <- function(someList, compare) {
local({
class(someList) <- 'tmpclass'
sort(someList)
}, envir = list(
`[.tmpclass` = function(x, i, ...) structure(unclass(x)[i], class = 'tmpclass'),
`==.tmpclass` = function(a, b) compare(a[[1]],b[[1]]) == 0,
`>.tmpclass` = function(a, b) compare(a[[1]],b[[1]]) > 0
))
}
l <- list('hello', 'world', 'how', 'is', 'everything')
# sort by char length
mySort(l, compare = function(a,b) nchar(a) - nchar(b))
While on the top level comparisons work as expected, all "memory" of that temporary S3 class is lost once sort
is called. So, things like someList[1] > someList[2]
produce the expected result when debugging right before the sort
call, but once I step into the sort
call, all that information is lost.
Curiously enough, I do get one step further if I explicitly set the environment of the sort
function.
environment(sort) <- environment()
sort(someList)
Through this, if I debug and step into sort
, I am still able to make comparisons. Once sort
calls more underlying methods however, this information is lost again.
The same goes if I try to call order
(which is also called by sort
at some point). If I set the environment for order
before calling it, comparisons work fine when debugging and stepping into that function. But, once order
calls xtfrm(x)
, this information is seemingly lost again.
mySort <- function(someList, compare) {
`[.tmpclass` <- function(x, i, ...) structure(unclass(x)[i], class = 'tmpclass')
`==.tmpclass` <- function(a, b) compare(a[[1]],b[[1]]) == 0
`>.tmpclass` <- function(a, b) compare(a[[1]],b[[1]]) > 0
class(someList) <- 'tmpclass'
environment(order) <- environment()
order(someList)
}
l <- list('hello', 'world', 'how', 'is', 'everything')
mySort(l, compare = function(a,b) nchar(a) - nchar(b))
Since xtfrm
is a primitive function that I can't seem to debug, I have a hunch that that may be actually causing problems. But I'm not sure.
Finally, it does actually work if I use some tacky global-environment version.
mySort <- function(someList, compare) {
# initialize globally
`[.tmpclass` <<- function(x, i, ...) structure(unclass(x)[i], class = 'tmpclass')
`==.tmpclass` <<- function(a, b) compare(a[[1]],b[[1]]) == 0
`>.tmpclass` <<- function(a, b) compare(a[[1]],b[[1]]) > 0
oldClass <- class(someList)
class(someList) <- 'tmpclass'
result <- sort(someList)
# make sure not to leave garbage behind
remove('[.tmpclass', '==.tmpclass', '>.tmpclass', envir = .GlobalEnv)
structure(result, class = oldClass)
}
l <- list('hello', 'world', 'how', 'is', 'everything')
unlist(mySort(l, compare = function(a,b) nchar(a) - nchar(b)))
# [1] "is" "how" "hello" "world" "everything"
However, this does not feel like a solid answer, let alone something easily accepted by CRAN (unless there is some way to create unique names that don't accidentally overwrite global variables?)
Is there a way to sort objects using a simple comparison function without introducing an S3 class globally? Or should I write my own sort algorithm now?