1

I would like to write a function that sorts by a column. This elementary exercise has of course been asked many times before, but the solutions suggested either depend on the literal and hence cannot be used in a function (also here, there, and there), or require dependence on the column ordering, which makes for brittle programming (also here).

What I'm looking for has apparently been dubbed "referential transparency". Fine. But it appears that adopting this term, at least in the present example, would lead to using very many words to define and distinguish literals in a program. Hence an MWE is best.

What should the body of the function sort.by.column contain, so that

sort.by.column <- function(df, column.name) {
    ## ??
}

df1 <- data.frame(Instrument=c("B","A"),
                  Value=c(3,2))
df2 <- data.frame(Device=c("D","C"),
                  Value=c(5,4))
column.name.1 <- "Instrument"
sorted1 <- sort.by.column(df1, column.name.1)
column.name.2 <- "Device"
sorted2 <- sort.by.column(df2, column.name.2)

will work for both df1 and df2?

Vrokipal
  • 784
  • 5
  • 18
  • what is wrong with using `dplyr::arrange`? i.e. `sorted1 <- arrange(df1, Instrument)` - note that no quotes are required. (if you do need the quotes, use `arrange_(df1, column.name.1)`) – Melissa Key Jun 18 '18 at 17:44
  • @MelissaKey I too thought that `arrange_` will do the trick nicely, until I read that `arrange_` is deprecated. It seems weird to deprecate the more general of the (`arrange`/`arrange_`) pair, and so I'm asking to figure out the logic of it all. – Vrokipal Jun 18 '18 at 18:14

3 Answers3

1

Here's a wrapper for dplyr::arrange to take text:

library(dplyr)
sort.by.column <- function(df, column.name) {
  col <- sym(column.name)
  arrange(df, !!col)
}
Melissa Key
  • 4,476
  • 12
  • 21
  • I see that it works, but I'd like to know why. The double negation is clear enough, but what is the significance of treating a symbol, the result of the `sym` function, as a Boolean? – Vrokipal Jun 18 '18 at 19:04
  • Actually, the `!!` is not being used as a double negation. See https://dplyr.tidyverse.org/articles/programming.html for a good introduction on this. The `rlang` package (which is imported into `dplyr`) is the replacement for `arrange_` and other such variants. – Melissa Key Jun 18 '18 at 19:12
0

You can write the function sort.by.column using order as:

sort.by.column <- function(df, column.name) {
  df[order(df[,column.name]),]
}    

#Lets test the function

sort.by.column(df1, "Value")
#   Instrument Value
# 2          B     2
# 1          A     3
sort.by.column(df2, "Value")
#   Device Value
# 2      D     4
# 1      C     5

sort.by.column(df1, "Instrument")
#   Instrument Value
# 1          A     3
# 2          B     2
sort.by.column(df2, "Device")
#   Device Value
# 1      C     5
# 2      D     4

Data:

df1 <- data.frame(Instrument=c("A","B"),
                  Value=c(3,2))
df2 <- data.frame(Device=c("C","D"),
                  Value=c(5,4))

Explanation

It works like this. With the initialization

df <- data.frame(Instrument=c("B","A"), Value=c(3,2))
col.name <- "Instrument"

the expression

df[,col.name]

returns a 1-column subset of the data frame. The expression

order(df[,col.name])

sorts that subset, returning the indices of the rows. Finally

df[order(df[,col.name]),]

returns a row-subset of the dataframe, with the row indices in the order computed.

Vrokipal
  • 784
  • 5
  • 18
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Can you confirm that the explanation that I just appended to your answer is accurate, or provide one? – Vrokipal Jun 18 '18 at 19:06
  • @Vrokipal Your description of the answer is appropriate. I have approved the your edit as well. Moreover, I dont suggest using any `library` for this function as result can be achieved using base=R. – MKR Jun 18 '18 at 21:01
0

I'm collecting here the solutions to confirm, compare, and understand.

s1 <- function(df, column.name) {
    #library(dplyr)    
    dplyr::arrange_(df, column.name)
}


# version which does not require a `library(dplyr)` call
a50915367 <- function(df, column.name) {
    # library(dplyr)
    col <- rlang::sym(column.name) # dplyr::sym also works
    dplyr::arrange(df, rlang::`!!`(col))    
}

library(dplyr) # required before function
a50915367 <- function(df, column.name) {
    col <- sym(column.name) 
    arrange(df, !!col)    
}

a50915313 <- function(df, column.name) {
    df[order(df[,column.name]),]
}

f <- function(sorting.function) {
    df1 <- data.frame(Instrument=c("B","A"), Value=c(3,2))
    df2 <- data.frame(Device=c("D","C"), Value=c(5,4))

    column.name.1 <- "Instrument"
    sorted1 <- sorting.function(df1, column.name.1)
    column.name.2 <- "Device"
    sorted2 <- sorting.function(df2, column.name.2)

    ret.list <- list(r1=sorted1,
                     r2=sorted2)
    ret.list
}

g <- function() {
    cat("----------------1----------------\n")
    print(f(s1))
    cat("----------------a50915367----------------\n")
    print(f(a50915367))
    cat("----------------a50915313----------------\n")
    print(f(a50915313))
}

g()
Melissa Key
  • 4,476
  • 12
  • 21
Vrokipal
  • 784
  • 5
  • 18
  • calling a library inside a function is usually bad practice. I've edited your code to reflect how to call these functions without calling `library` first. – Melissa Key Jun 18 '18 at 19:36