Library of Congress Classification numbers are used in libraries to give call numbers to things so they be ordered on the shelf. They can be simple or quite complex, with a few mandatory parts but many optional. (See "entering call numbers in 050" on 050 Library of Congress Call Number for how they break down, or lc_callnumber for a Ruby tool that sorts them.)
I would like to sort by LCC number in R. I've looked at Sort a list of nontrivial elements in R and Sorting list of list of elements of a custom class in R? but haven't got it figured out.
Here are four call numbers, entered in sorted order:
call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")
sort
sorts them by character, so 276 < 7 < 76.73 < 90.
> sort(call_numbers)
[1] "QA 276.45 R3 A35 2010" "QA 7 H3 1992" "QA 76.73 R3 W53 2015" "QA 90 H33 2016"
To sort them properly I think I'd have to define a class and then some methods on it, like this:
library(stringr)
class(call_numbers) <- "LCC"
## Just pick out the letters and digits for now, leave the rest
## until sorting works, then work down more levels.
lcc_regex <- '([[:alpha:]]+?) ([[:digit:]\\.]+?) (.*)'
"<.LCC" <- function(x, y) {
x_lcc <- str_match(x, lcc_regex)
y_lcc <- str_match(y, lcc_regex)
if(x_lcc[2] < y_lcc[2]) return(x)
if(as.integer(x_lcc[3]) < as.integer(y_lcc[3])) return(x)
}
"==.LCC" <- function(x, y) {
x_lcc <- str_match(x, lcc_regex)
y_lcc <- str_match(y, lcc_regex)
x_lcc[2] == y_lcc[2] && x_lcc[3] == y_lcc[3]
}
">.LCC" <- function(x, y) {
x_lcc <- str_match(x, lcc_regex)
y_lcc <- str_match(y, lcc_regex)
if(x_lcc[2] > y_lcc[2]) return(x)
if(as.integer(x_lcc[3]) > as.integer(y_lcc[3])) return(x)
}
This doesn't change the sort order. I haven't defined a subset method ("[.myclass"
) because I have no idea what it should be.