How to order a list by a custom function, discarding duplicates?

Question

I have this list :

thresholds <- list(
     list(color="red", value=100),
     list(color="blue", value=50),
     list(color="orange", value=100),
     list(color="green", value=1),
     list(color="orange", value=50)
)

I want to order it by the "value" field of each element and discard duplicates so that no two elements have the same "value" field in the resulting list (the element that gets picked when there's a tie doesn't matter).

sort and unique don't work with complex lists and don't permit a custom ordering. How to achieve the desired result?

Another approach: `vals=sapply(thresholds,\`[[\`,"value"); thresholds[match(unique(sort(vals)),vals)]` — cryo111, Nov 08 '19 at 12:42

score 7 · Answer 1 · answered Nov 08 '19 at 00:04

First of all, in this particular case, the actual vector to order is:

values <- sapply(thresholds, function (t) t$value)
# values == c(100, 50, 100, 1, 50)

You can adjust the function inside sapply for your needs (for instance, do the appropriate casting depending on whether you want to sort in numeric or alphabetical order, etc.).

From this point, if we were to keep the duplicates, the answer would simply be:

thresholds[order(values)]

order returns for each element in "values" its rank, i.e. its position if the vector were sorted. Here order(values) is 4 2 5 1 3. Then, thresholds[order(values)] returns the elements of thresholds identified by these indices, producing 1 50 50 100 100.

However, since we want to remove duplicates, it cannot be as simple as that. unique won't work on thresholds and if we apply it to values, it will lose the correspondence with the indices in the original list.

The solution is to use another function, namely duplicated. When applied on a vector, duplicated returns a vector of booleans, indicating for each element, if it already exists in the vector at an earlier position. For instance, duplicated(values) would return FALSE FALSE TRUE FALSE TRUE. This vector is the filter on duplicated elements we need here.

The solution is therefore:

ordering <- order(values)
nodups <- ordering[!duplicated(values)]
thresholds[nodups]

or as a one-liner:

thresholds[order(values)[!duplicated(values)]]

score 2 · Accepted Answer · answered Nov 08 '19 at 10:28

Adding another alternative, for completeness, regarding the "custom sort"/"custom unique" part of the question. By defining methods for certain functions (as seen in ?xtfrm) we can apply custom sort and unique functions to any list (or other object).

First, a "class" attribute needs to be added:

class(thresholds) = "thresholds"

Then, define the necessary custom functions:

"==.thresholds" = function(x, y) return(x[[1]][["value"]] == y[[1]][["value"]])
">.thresholds" = function(x, y) return(x[[1]][["value"]] > y[[1]][["value"]])
"[.thresholds" = function(x, i) return(structure(.subset(x, i), class = class(x)))
is.na.thresholds = function(x) return(is.na(x[[1]][["value"]]))

Now, we can apply sort:

sort(thresholds)

Finally, add a custom unique function:

duplicated.thresholds = function(x, ...) return(duplicated(sapply(x, function(elt) elt[["value"]])))
unique.thresholds = function(x, ...) return(x[!duplicated((x))])

And:

sort(unique(thresholds))

(Similar answers and more information here and here)

Ooooh, very nice. I think I like your solution better than mine actually. I came across the xtfrm man page and everything but didn't figure what to make of it, until your answer. — lgeorget, Nov 08 '19 at 11:50
@lgeorget : Your approach is, still, more straightforward and much faster than defining custom class methods but, I guess, it's fun to exploit some object-oriented functionalities of R in such cases :) — alexis_laz, Nov 08 '19 at 14:15

score 0 · Answer 3 · answered Nov 08 '19 at 04:40

If you like curly brackets you could do:

thresholds[{order(v <- unlist(Map(`[`, thresholds, 2)))}[!duplicated(v)]]
# [[1]]
# [[1]]$color
# [1] "green"
# 
# [[1]]$value
# [1] 1
# 
# 
# [[2]]
# [[2]]$color
# [1] "blue"
# 
# [[2]]$value
# [1] 50
# 
# 
# [[3]]
# [[3]]$color
# [1] "red"
# 
# [[3]]$value
# [1] 100

Include custom ordering in further brackets at the end.

thresholds[{order(v <- unlist(Map(`[`, thresholds, 2)))}[!duplicated(v)][c(3, 1, 2)]]
# [[1]]
# [[1]]$color
# [1] "red"
# 
# [[1]]$value
# [1] 100
# 
# 
# [[2]]
# [[2]]$color
# [1] "green"
# 
# [[2]]$value
# [1] 1
# 
# 
# [[3]]
# [[3]]$color
# [1] "blue"
# 
# [[3]]$value
# [1] 50

How to order a list by a custom function, discarding duplicates?

3 Answers3