What I'm trying to achieve in R is the following: given a table (data frame in my case) - I want to be get the lowest price for each unique combination of two columns.
For example, given the following table:
+-----+-----------+-------+----------+----------+
| Key | Feature1 | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA | 1 | 100 | whatever | whatever |
| AAA | 1 | 150 | whatever | whatever |
| AAA | 1 | 200 | whatever | whatever |
| AAA | 2 | 110 | whatever | whatever |
| AAA | 2 | 120 | whatever | whatever |
| BBB | 1 | 100 | whatever | whatever |
+-----+-----------+-------+----------+----------+
I want a result that looks like:
+-----+-----------+-------+----------+----------+
| Key | Feature1 | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA | 1 | 100 | whatever | whatever |
| AAA | 2 | 110 | whatever | whatever |
| BBB | 1 | 100 | whatever | whatever |
+-----+-----------+-------+----------+----------+
So I'm working on a solution along the lines of:
s <- lapply(split(data, list(data$Key, data$Feature1)), function(chunk) {
chunk[which.min(chunk$Price),]})
But the result is a 1 x n matrix - so I need to unsplit
the result. Also - it seems very slow. How can I improve this logic?
I've seen solutions pointing in the directions of the data.table
package. Should I re-write using that package?
Update
Great answers guys - thanks! However - my original dataframe contains more columns ( Feature2 ... ) and I need them all back after the filtering. The rows that do not have the lowest price ( for the combination of Key/Feature1 ) can be discarded, so I'm not interested in their values for Feature2 / Feature3