6

Library of Congress Classification numbers are used in libraries to give call numbers to things so they be ordered on the shelf. They can be simple or quite complex, with a few mandatory parts but many optional. (See "entering call numbers in 050" on 050 Library of Congress Call Number for how they break down, or lc_callnumber for a Ruby tool that sorts them.)

I would like to sort by LCC number in R. I've looked at Sort a list of nontrivial elements in R and Sorting list of list of elements of a custom class in R? but haven't got it figured out.

Here are four call numbers, entered in sorted order:

call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")

sort sorts them by character, so 276 < 7 < 76.73 < 90.

> sort(call_numbers)
[1] "QA 276.45 R3 A35 2010" "QA 7 H3 1992"          "QA 76.73 R3 W53 2015"  "QA 90 H33 2016"       

To sort them properly I think I'd have to define a class and then some methods on it, like this:

library(stringr)
class(call_numbers) <- "LCC"

## Just pick out the letters and digits for now, leave the rest
## until sorting works, then work down more levels.
lcc_regex <- '([[:alpha:]]+?) ([[:digit:]\\.]+?) (.*)'

"<.LCC" <- function(x, y) {
    x_lcc <- str_match(x, lcc_regex)
    y_lcc <- str_match(y, lcc_regex)
    if(x_lcc[2] < y_lcc[2]) return(x)
    if(as.integer(x_lcc[3]) < as.integer(y_lcc[3])) return(x)
}
"==.LCC" <- function(x, y) {
    x_lcc <- str_match(x, lcc_regex)
    y_lcc <- str_match(y, lcc_regex)
    x_lcc[2] == y_lcc[2] && x_lcc[3] == y_lcc[3]
}

">.LCC" <- function(x, y) {
    x_lcc <- str_match(x, lcc_regex)
    y_lcc <- str_match(y, lcc_regex)
    if(x_lcc[2] > y_lcc[2]) return(x)
    if(as.integer(x_lcc[3]) > as.integer(y_lcc[3])) return(x)
}

This doesn't change the sort order. I haven't defined a subset method ("[.myclass") because I have no idea what it should be.

William Denton
  • 737
  • 1
  • 5
  • 11
  • The relational operators you 've defined need to return either TRUE or FALSE. Changing your last two lines (containing the "if" clauses") of "<" and ">" to `(x_lcc[2] < y_lcc[2]) || (as.integer(x_lcc[3]) < as.integer(y_lcc[3]))` and `(x_lcc[2] > y_lcc[2]) || (as.integer(x_lcc[3]) > as.integer(y_lcc[3]))`, respectively, and defining a `"["` method to preserve the class after subsetting (e.g. `"[.LCC" = function(x, i) structure(.subset(x, i), class = class(x))`), then, `sort(call_numbers)` seems to work appropriately. – alexis_laz Jul 21 '17 at 16:32

4 Answers4

2

This might be a simplier approach. This assumes every number has the following format: 2-letter code, space, number, space, letter-number, space...Year.

The strategy is two split the LOC number by spaces and then obtain 3 columns of data for the first 3 fields and then each column can be sequentially sorted with the order function.

call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")

#split on the spaces
 split<-strsplit(call_numbers, " " )
#Retrieve the 2 letter code
 letters<-sapply(split, function(x){x[1]})
#retrieve the 2nd number group and convert to numeric values for sorting
 second<-sapply(split, function(x){as.numeric(x[2])})
#obtain the 3rd grouping
 third<-sapply(split, function(x){x[3]})
#find the year
 year<-sapply(split, function(x){x[length(x)]})

df<-data.frame(call_numbers)
#sort data based on the first and 2nd column
call_numbers[order(letters, second, third)]

For this limited dataset the technique works.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
1

I feel like I spent way too much time on figuring out a solution to exactly what you're trying to do --only mine was for JavaScript. But it basically comes down to the notion of "normalization" of these numbers so that they can be sorted alphabetically.

Maybe this solution can be used and ported over to R. At the very least, hopefully this could get you started. It involves some regular expressions and a little bit of extra scripting to get the call numbers into a state where they can be sorted.

https://github.com/rayvoelker/js-loc-callnumbers/blob/master/locCallClass.js

Good luck!

ray_voelker
  • 495
  • 3
  • 12
1

mixedsort from the gtools package turns out to do just the trick:

library(gtools)
call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")
mixedsort(call_numbers)
## [1] "QA 7 H3 1992"          "QA 76.73 R3 W53 2015"  "QA 90 H33 2016"        "QA 276.45 R3 A35 2010"

Further, mixedorder can be used to sort a data frame by one column.

This is a special case of what was answered earlier in How to sort a character vector where elements contain letters and numbers in R?

William Denton
  • 737
  • 1
  • 5
  • 11
1

Easiest (and elegant) way: using str_sortfrom the packg stringr

# install.packages("stringr") ## Uncomment if not already installed
library(stringr)

str_sort(call_numbers, numeric = TRUE)

[1] "QA 7 H3 1992"          "QA 76.73 R3 W53 2015"  "QA 90 H33 2016"       
[4] "QA 276.45 R3 A35 2010"
JDie
  • 651
  • 5
  • 6