I occasionally need to extract specific rows from a data.frame based on values from one of the variables. R
has built-in functions for maximum (which.max()
) and minimum (which.min()
) that allow me to easily extract those rows.
Is there an equivalent for median? Or is my best bet to just write my own function?
Here's an example data.frame and how I would use which.max()
and which.min()
:
set.seed(1) # so you can reproduce this example
dat = data.frame(V1 = 1:10, V2 = rnorm(10), V3 = rnorm(10),
V4 = sample(1:20, 10, replace=T))
# To return the first row, which contains the max value in V4
dat[which.max(dat$V4), ]
# To return the seventh row, which contains the min value in V4
dat[which.min(dat$V4), ]
For this particular example, since there are an even number of observations, I would need to have two rows returned, in this case, rows 2 and 10.
Update
It would seem that there is not a built-in function for this. As such, using the reply from Sacha as a starting point, I wrote this function:
which.median = function(x) {
if (length(x) %% 2 != 0) {
which(x == median(x))
} else if (length(x) %% 2 == 0) {
a = sort(x)[c(length(x)/2, length(x)/2+1)]
c(which(x == a[1]), which(x == a[2]))
}
}
I'm able to use it as follows:
# make one data.frame with an odd number of rows
dat2 = dat[-10, ]
# Median rows from 'dat' (even number of rows) and 'dat2' (odd number of rows)
dat[which.median(dat$V4), ]
dat2[which.median(dat2$V4), ]
Are there any suggestions to improve this?