Get minimum grouped by unique combination of two columns

Question

What I'm trying to achieve in R is the following: given a table (data frame in my case) - I want to be get the lowest price for each unique combination of two columns.

For example, given the following table:

+-----+-----------+-------+----------+----------+
| Key | Feature1  | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA |         1 |   100 | whatever | whatever |
| AAA |         1 |   150 | whatever | whatever |
| AAA |         1 |   200 | whatever | whatever |
| AAA |         2 |   110 | whatever | whatever |
| AAA |         2 |   120 | whatever | whatever |
| BBB |         1 |   100 | whatever | whatever |
+-----+-----------+-------+----------+----------+

I want a result that looks like:

+-----+-----------+-------+----------+----------+
| Key | Feature1  | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA |         1 |   100 | whatever | whatever |
| AAA |         2 |   110 | whatever | whatever |
| BBB |         1 |   100 | whatever | whatever |
+-----+-----------+-------+----------+----------+

So I'm working on a solution along the lines of:

s <- lapply(split(data, list(data$Key, data$Feature1)), function(chunk) { 
        chunk[which.min(chunk$Price),]})

But the result is a 1 x n matrix - so I need to unsplit the result. Also - it seems very slow. How can I improve this logic? I've seen solutions pointing in the directions of the data.table package. Should I re-write using that package?

Update

Great answers guys - thanks! However - my original dataframe contains more columns ( Feature2 ... ) and I need them all back after the filtering. The rows that do not have the lowest price ( for the combination of Key/Feature1 ) can be discarded, so I'm not interested in their values for Feature2 / Feature3

With what logic the values of the other columns are to be taken? If for instance `Feature2` has different values for the same key-feature1, which value must be included in the output? — nicola, Jul 10 '15 at 15:40
The value belonging to the lowest price. So this thing needs to work as a row filter. So the 'whatever' of AAA-1, AAA-2, BBB-1. Rest of the rows can be discarded. — Jochen van Wylick, Jul 10 '15 at 15:43

jeremycg · Accepted Answer · 2015-07-10T15:41:37.873

3

You can use the dplyr package:

library(dplyr)

data %>% group_by(Key, Feature1) %>%
         slice(which.min(Price))

edited Jul 10 '15 at 15:41

answered Jul 10 '15 at 15:22

jeremycg

24,657
5
63
74

Great works - but I need to get all columns back in the result. I simplified the example a bit. In reality the data contains more columns, which I need in the result. – Jochen van Wylick Jul 10 '15 at 15:32

user227710 · Answer 2 · 2015-07-10T16:06:36.757

Since you referred to data.table package, I provide here the solution using that package:

library(data.table)
setDT(df)[,.(Price=min(Price)),.(Key, Feature1)] #initial question
setDT(df)[,.SD[which.min(Price)],.(Key, Feature1)] #updated question

df is your sample data.frame.

Update: Test using mtcars data

df<-mtcars
library(data.table)
setDT(df)[,.SD[which.min(mpg)],by=am]
   am  mpg cyl disp  hp drat   wt  qsec vs gear carb
1:  1 15.0   8  301 335 3.54 3.57 14.60  0    5    8
2:  0 10.4   8  472 205 2.93 5.25 17.98  0    3    4

score 1 · Answer 3 · answered Jul 10 '15 at 15:25

1

The base R solution would be aggregate(Price ~ Key + Feature1, data, FUN = min)

answered Jul 10 '15 at 15:25

maccruiskeen

2,748
2
13
23

Very elegant - but I need to get all columns back in the result. I simplified the example a bit. In reality the data contains more columns, which I need in the result. – Jochen van Wylick Jul 10 '15 at 15:32
Do you mean you want the minimum value back in your original dataframe? If that's the case use `ave(data$Price, data$Key, data$Feature, FUN = min)`. – maccruiskeen Jul 10 '15 at 15:36
No - see the updated question - I only want the row with the lowest value ( for the unique combination of Key + Feature1 ) - but with all the original values. I tried your code and it returns only 3 columns: Key, Feature1 and Price - but I also need all the other original columns. – Jochen van Wylick Jul 10 '15 at 15:38
Are the data in other columns identical within each combination of Key and Feature1? If not do you want their mean? – maccruiskeen Jul 10 '15 at 15:45
It's irrelevant. If the row does not contain the minimum price ( for the combination of Key + Feature1 ) the row can be discareded. It needs to work as a filter on the rows. - thanks for all your effort btw. – Jochen van Wylick Jul 10 '15 at 15:51
1

Ah I see. Well the `dplyr` solution by jeremycg looks good. A `data.table` one would to do the same would be `setDT(data)[, lapply(.SD, min), by = list(Key, Feature1)]`, provided you did `data <- data.table(data)`. – maccruiskeen Jul 10 '15 at 15:54
Sorry for being unclear - I didn't expect my simplification of the problem would result in not giving enough information to come up with a solution. – Jochen van Wylick Jul 10 '15 at 15:57
@choff: I already posted a solution from `data.table` package. If you think that is not working, please let me know. – user227710 Jul 10 '15 at 16:05
1

@user227710 I see your update now. Your solution works. – maccruiskeen Jul 10 '15 at 16:07

score 0 · Answer 4 · edited May 23 '17 at 11:44

0

Using R base aggregate

> aggregate(Price~Key+Feature1, min, data=data)
  Key Feature1 Price
1 AAA        1   100
2 BBB        1   100
3 AAA        2   110

See this post for other alternatives.

edited May 23 '17 at 11:44

Community

1
1

answered Jul 10 '15 at 15:27

Jilber Urbina

58,147
10
114
138

Get minimum grouped by unique combination of two columns

4 Answers4