Is there a way to find an entry in a dataset with the lowest value in one column and the greatest value in another?

Question

Im trying to find the entry in a dataset that has the highest value in one column, and the lowest value in another. The dataset that I'm using consists of CPUs as entries with their specs as the variables or columns of the dataset (with columns made to represent the 'Speed' (or clockwork of each CPU in GHz), and 'Price' of each entry. This is currently what I have for code, but it only produces the output shown. I am trying to find the entry in the dataset with the smallest price and the greatest Speed. How would I change this so that it gives me a definite answer?

library(tidyverse)
library(readxl)
library(dplyr)
CPU_prices <- read_csv("CPU prices - Desktop-Mobile.csv")

CPU_prices[which.min(CPU_prices$Price),which.max(CPU_prices$"Speed")]

[reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Ben373, May 08 '21 at 22:12
Images are not the right way to share data/code. Add them in a reproducible format which is easier to copy. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). — Ronak Shah, May 09 '21 at 05:50

score 0 · Answer 1 · answered May 09 '21 at 01:46

A few things:

Your code using which.min and which.max is mistakingly providing the which.min as the row index, and the which.max as the column index:
```
CPU_prices[which.min(CPU_prices$Price),which.max(CPU_prices$"Speed")]
```
The former is okay, but the latter makes no sense: if (for example) the which.min(.) resulted in a 5, and the which.max(.) resulted in a 9, then you'd be trying to look at CPU_prices[5,9], which will fail given that the frame only has 7 columns.

Given that (in this example), which row would you expect to discover/return?
It is unlikely to have one column's maxima and another column's minima occur in the same row. While it certainly can happen (with perfect negative correlation, or just nice convenience), due to many things it will not be surprising if they don't occur on the same exact row.

As an example, mtcars: we are fortunate that the minimum displacement and maximum mpg both occur on the same row:
```
which.min(mtcars$disp)
# [1] 20
which.max(mtcars$mpg)
# [1] 20
```
but if you were instead trying to find the minimum hp and the maximum mpg, you would not have such luck:
```
which.min(mtcars$hp)
# [1] 19
which.max(mtcars$mpg)
# [1] 20
```
In this case, which row do you want returned? If your answer is not always/immediately one very specific column, and is instead "it depends", then ... it's very difficult for R to know what "depends" really means.
Even ordering by the two columns is imperfect:
```
order(CPU_prices$Price, -CPU_prices$Speed)
# or
dplyr::arrange(CPU_prices, Price, -Speed)
```
Both of those will likely only sort by Price, ignoring Speed. The only time the ordering of Speed is a concern is when there are identical Price values. So this way has its flaws.

Finding the "best" (min of one, max of another) is often a multi-object decision analysis thing. This means you need to find some (arithmetic? parametric?) combination of the two values that produces a representative single value. For instance, one might take the absolute difference between a Price and the least-expensive piece, add to that the absolute difference between the Speed and the fastest piece. Something like:

mtcars$Value <- with(mtcars, abs(mpg - max(mpg)) + abs(hp - min(hp)))
mtcars[order(mtcars$Value),c("mpg","hp","Value")]
#                      mpg  hp Value
# Honda Civic         30.4  52   3.5
# Toyota Corolla      33.9  65  13.0
# Fiat 128            32.4  66  15.5
# Merc 240D           24.4  62  19.5
# Fiat X1-9           27.3  66  20.6
# ...truncated...
# Chrysler Imperial   14.7 230 197.2
# Duster 360          14.3 245 212.6
# Camaro Z28          13.3 245 213.6
# Ford Pantera L      15.8 264 230.1
# Maserati Bora       15.0 335 301.9

This is a flawed approach, since they are not on the same scale or the same units ... so you might try

mtcars$Value2 <- with(mtcars, abs(mpg - max(mpg))/diff(range(mpg)) + abs(hp - min(hp))/diff(range(hp)))
mtcars[order(mtcars$Value2),c("mpg","hp","Value","Value2")]
#                      mpg  hp Value    Value2
# Toyota Corolla      33.9  65  13.0 0.0459364
# Fiat 128            32.4  66  15.5 0.1132998
# Honda Civic         30.4  52   3.5 0.1489362
# Fiat X1-9           27.3  66  20.6 0.3303210
# Lotus Europa        30.4 113  64.5 0.3644839
# ...truncated...
# Ford Pantera L      15.8 264 230.1 1.5193294
# Cadillac Fleetwood  10.4 205 176.5 1.5406360
# Camaro Z28          13.3 245 213.6 1.5585745
# Lincoln Continental 10.4 215 186.5 1.5759717
# Maserati Bora       15.0 335 301.9 1.8042553

This also has its flaws. Perhaps you consider having slightly more hp to be more important, so you can weight it more (2*.) or transform it (sqrt, log, or .^2) or something else.

mtcars$Value2 <- with(mtcars,
  abs(mpg - max(mpg))/diff(range(mpg)) +
  2 * abs(hp - min(hp))/diff(range(hp)))

Perhaps the ratio of the two differences is preferred (compensating for the 0 that will happen for the respective min/max values).

There are lots of ways to combine them, depending on the scale, importance, preference, and ... other things that are known only to you. (And MODA tends to be a more "soft solution", a way to find a good solution given opposing objectives. I say "soft" because it is an objective numeric minimization/maximization based on subjective weights applied to informed but often imperfect transforms.)

Is there a way to find an entry in a dataset with the lowest value in one column and the greatest value in another?

1 Answers1