Nth smallest value for every column in data.frame in R

Question

I want to find the nth smallest number for every column in a data.frame.

In the below example I specify actually the second smallest value using the dcast nth function. Can someone help with the coding of the function?

library(vegclust)
library(dplyr)
data(wetland)
dfnorm = decostand(wetland,"normalize")
dfchord = dist(dfnorm, method = "euclidean")
dfchord = data.frame(as.matrix(dfchord)
number_function = function(x) nth(x,2) # can change 2 to any number..

answer_vector = apply(dfchord, 2, number) # here, 2 specifying apply on columns

The actual answer would be something like this..

ans = c(0.5689322,0.579568297,0.315017693,0.315017693,0.632246369, 0.868563003, 0.704638684, 0.35827587, 0.725220337, 0.516397779) # length of 1:38

It sounds like a bit of a strange thing to do. To make this more readable for yourself and your colleagues in the future, you might want to [melt](http://seananderson.ca/2013/10/19/reshape.html) and then [split-apply-combine](http://stackoverflow.com/questions/26664644/use-dplyrs-group-by-to-perform-split-apply-combine) — citynorman, Jan 17 '17 at 03:42

score 4 · Answer 1 · answered Jan 17 '17 at 05:31

Just a warning, if you don't specify the order for dplyr's nth(), it will not actually do the sorting:

For example,

> sapply(mtcars, dplyr::nth, 2)
    mpg     cyl    disp      hp    drat      wt    qsec      vs      am    gear    carb 
 21.000   6.000 160.000 110.000   3.900   2.875  17.020   0.000   1.000   4.000   4.000

which is actually just the second row of the data:

> mtcars[2,]
              mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

The nth function in Rfast does sort by default:

> sapply(mtcars, Rfast::nth, 2)
   mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb 
10.400  4.000 75.700 62.000  2.760  1.615 14.600  0.000  0.000  3.000  1.000

If you are sensitive to performance, the Rfast version was written to scale well by using a partial sort, which isn't true for solutions based on sort, order or rank (including dplyr::nth).

Rfast has a function colnth, which is even faster than using sapply and nth, but it requires that you specify for each column the nth value that you wish to obtain. — Stefanos, Nov 06 '18 at 16:05

cuttlefish44 · Accepted Answer · 2017-01-17T03:39:47.040

1

Here is my example;

num_func <- function(x, n) nth(sort(x), n)
sapply(dfchord, num_func, n = 2)  # edited (thanks for @thelatemail's comment)

edited Jan 17 '17 at 03:39

answered Jan 17 '17 at 03:35

cuttlefish44

6,586
2
17
34

Nate · Answer 3 · 2017-01-17T03:59:53.983

1

Since you already like dplyr here is what I do now days with purrr:

purrr::map_dbl(mtcars, ~nth(., 2, order_by = .))
   mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb 
10.400  4.000 75.700 62.000  2.760  1.615 14.600  0.000  0.000  3.000  1.000

or with just dplyr since its already loaded for nth():

summarise_all(mtcars, funs(nth(., 2, order_by = .))
   mpg cyl disp hp drat    wt qsec vs am gear carb
1 10.4   4 75.7 62 2.76 1.615 14.6  0  0    3    1

edited Jan 17 '17 at 03:59

answered Jan 17 '17 at 03:37

Nate

10,361
3
33
40

1

Without packages - `mtcars[sapply(mtcars, rank, ties.method="first")==2]` – thelatemail Jan 17 '17 at 03:43

score 1 · Answer 4 · answered May 20 '20 at 06:38

1

If you are looking for a faster alternative to nth and sort, there is the topn function in the package "kit" in CRAN. Please look at the documentation.

answered May 20 '20 at 06:38

Suresh_Patel

291
3
5

score 0 · Answer 5 · answered Jan 17 '17 at 03:29

So here it is an answer to get any nth value across the columns of any data.frame you need only change the x in the y[x].

x = dfchord

for (i in (1:ncol(x))) {
  y = sort(x[,i], decreasing=FALSE)
  ans$small[i] = y[2] # this is the second biggest number, replace the value with whatever you want
  ans$rel = rownames(x)
}

answer = data.frame( 'nth' = ans$small, 'rel' = ans$rel)

score 0 · Answer 6 · answered Jan 17 '17 at 06:40

With dplyr::summarize_each

n <- 2
dfchord %>% summarize_each(funs(nth(sort(.),n)))
#          X5        X8       X13        X4       X17       X3        X9       X21       X16       X14        X2       X15        X1        X7
# 1 0.5689322 0.5795683 0.3150177 0.3150177 0.6322464 0.868563 0.7046387 0.3582759 0.7252203 0.5163978 0.3651484 0.5163978 0.3582759 0.4222794
#         X10      X40       X23       X25       X22      X20        X6      X18      X12       X39       X19       X11       X30       X34
# 1 0.4222794 0.507107 0.6206017 0.4536844 0.4536844 0.654303 0.5126421 0.338204 0.338204 0.5126421 0.5393651 0.5804794 0.7270723 0.5242481
#        X28       X31       X26       X29       X33       X24       X36      X37       X41       X27       X32       X35       X38
# 1 0.735765 0.5242481 0.7270723 0.8749704 0.5715592 0.4933355 0.4933355 0.574123 0.7443697 0.6333863 0.6333863 0.7296583 0.6709442

Nth smallest value for every column in data.frame in R

6 Answers6