1

My dataset, named ds, is a matrix with three columns and 4000+ observations. The three columns in ds are:

name v2 f1
  1. name is character
  2. v2 is numeric
  3. f1 is factor with 54 levels

I want to find the position of the min for v2 for factor x. I tried to use tapply as follows

tapply(ds$v2, ds$f1 == x, which.min)

The answer I get is something like this:

FALSE  TRUE 
 2821    19

I presumed that 19 is the absolute position in my dataset and if I want to find the name of the observation all I need to do is

ds[19, 1]

But apparently that is incorrect. I have understood that 19 corresponds to the relative position i.e. it is the 19th observation for factor x.

So my question is: How can I find the absolute position for min value of factor x?

10 Rep
  • 2,217
  • 7
  • 19
  • 33
Rachit Kinger
  • 341
  • 2
  • 10
  • 2
    Please, provide a small excerpt of your dataset and the desired output based on it and your question will become a good one. – nicola Apr 03 '17 at 13:55
  • I guess `tapply` is pretty messy for this, something like `tapply(1:nrow(iris), iris$Species, function(i) i[which.min(iris$Sepal.Length)])`. If you are willing to use a package like dplyr or data.table, some more intuitive syntax is available, though. Alternately, the `by()` function may help: http://stackoverflow.com/a/24070835/ – Frank Apr 03 '17 at 14:25

1 Answers1

0

tapply will apply the function on each unique value of the second argument so you shouldn't use ds$f1 == x and probably just ds$f1 so it looks like:

tapply(ds$v2, ds$f1 == x, which.min)

Here is an example with the iris data set that comes with R:

tapply(iris$Sepal.Length, iris$Species, which.min)

EDIT:

However, as you noted, this will give you the position within the subsetted data and not the absolute position.

I don't think it's possible to get the absolute value from tapply because you are working on a single vector. If you want to work with multiple columns at once, you can use this kind of approach:

d <- split(iris, iris$Species)
row_positions <- sapply(d, function(x) rownames(x[which.min(x$Sepal.Length), ]))
iris[row_positions, ]
fmic_
  • 2,281
  • 16
  • 23
  • This still gives row numbers within subgroups instead of the full table. Try `iris[tapply(iris$Sepal.Length, iris$Species, which.min), ]` to see the problem. – Frank Apr 03 '17 at 14:19
  • 1
    Thanks for pointing this out @Frank, I misunderstood the question. I edited my answer. – fmic_ Apr 03 '17 at 14:46
  • Thanks @sinQueso I did something similar and that helped. I split the ds using split like this: `y <- split(ds, ds$f1 == x)$'TRUE'` . This created a matrix based on the factor. I ran which.min and then got the desired output. – Rachit Kinger Apr 03 '17 at 16:31