1

This is a simplified example. I have a data frame with two variables like this:

a <- c(1,1,1,2,2,2,3,3,6,7,4,5,5,8)
b <- c(5,10,4,2,8,4,6,9,12,3,7,4,1,7)
D <- data.frame(a,b)

As you can see, there are 8 values for a but they have replicated, and my data-frame has 14 observations. I want to create a data-frame which has 8 observations in which the a quantities are unique, and the b values are the minimum of choices, i.e., the result should be like:

  a  b
1 1  4
2 2  2
3 3  6
4 6 12
5 7  3
6 4  7
7 5  1
8 8  7
Novic
  • 351
  • 1
  • 11
  • 1
    Pick your favorite method from the FAQ [How to sum a variable by group](https://stackoverflow.com/q/1660124/903061), and then replace `sum` with `min` to get the minimum instead. – Gregor Thomas Jul 23 '18 at 16:53

4 Answers4

3

Here's how to do it with base R:

#both lines do the same thing, pick one
aggregate(D$b, by = D["a"], FUN = min)
aggregate(b ~ a, data = D, FUN = min)

Here's how to do it with data.table:

library(data.table)
setDT(D)
D[ , .(min(b)), by=a]

Here's how to do it with tidyverse functions:

library(tidyverse) #or just library(dplyr)
D %>% group_by(a) 
  %>% summarize(min(b))
DanY
  • 5,920
  • 1
  • 13
  • 33
  • 1
    Using formulas notation is a more readable version of `aggregate` : `aggregate(b~a, data=D, FUN = min)` – Jilber Urbina Jul 23 '18 at 17:04
  • 1
    @JilberUrbina - You're probably right. I loath `aggregate()`. It's clearly a function written without data.frames in mind, yet it's a common thing to do with data stored in a data.frame. – DanY Jul 23 '18 at 17:23
2

Using R base approach:

> D2  <- D[order(D$a, D$b ), ]
> D2  <- D2[ !duplicated(D2$a), ]
> D2
   a  b
3  1  4
4  2  2
7  3  6
11 4  7
13 5  1
9  6 12
10 7  3
14 8  7
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • This solution certainly works for this very particular question, but I would advise readers to look at other solutions posted below that use `aggregate`, `data.table`, or `dplyr`/`tidyverse` functions as those solutions allow you to easily switch out the function of interest (i.e., `sum` instead of `min`). – DanY Jul 23 '18 at 17:29
  • I believe it's inefficient and convoluted to sort and remove duplicates when it's the most basic case of aggregation. – moodymudskipper Jul 24 '18 at 10:18
  • 1
    @Moody_Mudskipper I just posted an alternative different from `aggregate`, because there are other answers using `aggregte`. This is only another point of view. – Jilber Urbina Jul 24 '18 at 14:48
  • That's fair, and I didn't downvote, but I think it was worth mentioning – moodymudskipper Jul 24 '18 at 14:53
1

A base R option would be

aggregate(b ~ a, D, min)
akrun
  • 874,273
  • 37
  • 540
  • 662
0

library (dplyr)

D<-D %>% group_by(a) %>% summarize(min(b))

Ankur
  • 141
  • 10