0

Probably this is not that complex, but I couldn't figure out how to write a concise title explaining it:

I'm trying to use the aggregate function in R to return (1) the lowest value of a given column (val) by category (cat.2) in a data frame and (2) the value of another column (cat.1) on the same row. I know how to do part #1, but I can't figure out part #2.

The data:

cat.1<-c(1,2,3,4,5,1,2,3,4,5)
cat.2<-c(1,1,1,2,2,2,2,3,3,3)
val<-c(10.1,10.2,9.8,9.7,10.5,11.1,12.5,13.7,9.8,8.9)
df<-data.frame(cat.1,cat.2,val)

> df
   cat.1 cat.2  val
1      1     1 10.1
2      2     1 10.2
3      3     1  9.8
4      4     2  9.7
5      5     2 10.5
6      1     2 11.1
7      2     2 12.5
8      3     3 13.7
9      4     3  9.8
10     5     3  8.9

I know how to use aggregate to return the minimum value for each cat.2:

> aggregate(df$val, by=list(df$cat.2), FUN=min)
  Group.1   x
1       1 9.8
2       2 9.7
3       3 8.9

The second part of it, which I can't figure out, is to return the value in cat.1 on the same row of df where aggregate found min(df$val) for each cat.2. Not sure I'm explaining it well, but this is the intended result:

> ...
  Group.1   x  cat.1
1       1 9.8      3
2       2 9.7      4
3       3 8.9      5

Any help much appreciated.

Bolio
  • 249
  • 1
  • 9

1 Answers1

2

If we need the output after the aggregate, we can do a merge with original dataset

merge(aggregate(df$val, by=list(df$cat.2), FUN=min), 
       df,  by.x = c('Group.1', 'x'), by.y = c('cat.2', 'val'))
#  Group.1   x cat.1
#1       1 9.8     3
#2       2 9.7     4
#3       3 8.9     5

But, this can be done more easily with dplyr by using slice to slice the rows with the min value of 'val' after grouping by 'cat.2'

library(dplyr)
df %>% 
    group_by(cat.2) %>% 
    slice(which.min(val))
# A tibble: 3 x 3
# Groups:   cat.2 [3]
# cat.1 cat.2   val
#  <dbl> <dbl> <dbl>
#1     3     1   9.8
#2     4     2   9.7
#3     5     3   8.9

Or with data.table

library(data.table)
setDT(df)[, .SD[which.min(val)], cat.2]

Or in base R, this can be done with ave

df[with(df, val == ave(val, cat.2, FUN = min)),]
#   cat.1 cat.2 val
#3      3     1 9.8
#4      4     2 9.7
#10     5     3 8.9

akrun
  • 874,273
  • 37
  • 540
  • 662