14

I would like to find the closest value to column x3 below.

data=data.frame(x1=c(24,12,76),x2=c(15,30,20),x3=c(45,27,15))
data
  x1 x2 x3
1 24 15 45
2 12 30 27
3 76 20 15

So desired output will be

Closest_Value_to_x3
   24
   30
   20

Please help. Thank you

markus
  • 25,843
  • 5
  • 39
  • 58
melik
  • 1,268
  • 3
  • 21
  • 42

4 Answers4

12

Use max.col(-abs(data[, 3] - data[, -3])) to find the column positions of the closest values and use this result as part of a matrix to extract desired values from your data. The matrix is returned by cbind

col <- 3
data[, -col][cbind(1:nrow(data),
                   max.col(-abs(data[, col] - data[, -col])))]
#[1] 24 30 20
markus
  • 25,843
  • 5
  • 39
  • 58
  • Nice answer. Although I don't think the outside `[, 1:2]` subset is necessary since you've already done that subset inside the `abs()` call. – Rich Scriven Jan 23 '19 at 20:25
  • @RichScriven Thanks for your comment. I guess I included it in case OP changes their mind and want's to find the closest values to, say, column 1, in which case we'd need the `[, 2:3]` subset. – markus Jan 23 '19 at 20:32
4

A tidyverse solution:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  select(val)

    val
  <dbl>
1    24
2    30
3    20

First, it adds a row ID. Second, it transforms the data from wide to long. Third, it calculates the difference between "x3" and the other variables. Finally, it groups by the row ID and keeps the rows where the absolute difference is the smallest.

Or:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  pull(val)

[1] 24 30 20

Or using an approach originally proposed by @markus (it assumes that your columns are named "x"):

data %>%
 mutate(temp = paste0("x", max.col(-abs(.[, -3] - .[, 3])))) %>%
 rowwise() %>%
 summarise(val = eval(as.symbol(temp)))

    val
  <dbl>
1   24.
2   30.
3   20.

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest and combines it with "x". Then, it evaluates the combination of x and column index as a variable and returns the appropriate value.

Also borrowing the idea from @markus (not assuming that your columns are named "x"):

data %>%
 mutate(temp = max.col(-abs(.[, -3] - .[, 3]))) %>%
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest. Second, it returns the column name based on the column index. Finally, it evaluates it as a variable and returns the appropriate value.

Or a variant where you can reference the "x3" variable by its name and not by column index (the basic idea still from @markus):

data %>%
 mutate(temp = max.col(-abs(.[, !grepl("x3", colnames(.))] - .[, grepl("x3", colnames(.))]))) %>% 
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • I like that I can always count on you for a tidyverse approach but sometimes they look so complex and intimidating. Great all the same! – NelsonGon Jan 23 '19 at 14:19
  • 1
    @NelsonGon sometimes it gets really verbose, that is true. But that is also true that the `tidyverse` aprroaches in general are not the ones with the shortest code. Anyway, thank you for your compliment :) – tmfmnk Jan 23 '19 at 14:30
  • 1
    This is *a* tidyverse solution, not the only one, and not one I'd write. You can make the code a lot less verbose by following the natural logic of the other answer, no need to reshape the data. – Konrad Rudolph Jan 23 '19 at 17:19
  • @Konrad Rudolph I agree with you, however, I'm not presenting this post as _the_ `tidyverse` solution. Also, I'm not saying that this particular problem cannot be solved using `tidyverse` with a lot less verbose code. I'm just saying that the `tidyverse` solutions are, in general, not the ones with the least verbose code. – tmfmnk Jan 23 '19 at 20:49
  • 1
    @tmfmnk You’re completely right but just to clarify, I think in this case you’re torturing dplyr, and it confesses to anything. You could do a simpler single `mutate`: `data %>% mutate(d = .[, -3][cbind(row_number(), max.col(- abs(.[, 3] - .[, -3])))])` — I’d be tempted to introduce a temporary column to hold the result of `max.col` but otherwise that’s it. – Konrad Rudolph Jan 24 '19 at 09:57
  • 1
    "I think in this case you’re torturing dplyr, and it confesses to anything", this could be on `fortunes`. – RLave Jan 24 '19 at 10:56
  • @Konrad Rudolph the code in your post is a literal transcription of the code from markus. It is a single line and it does the job, yes, but, in my opinion, it is not representing the `tidyverse` mentality. Anyway, thank you for your remarks :) – tmfmnk Jan 24 '19 at 17:03
  • 1
    @tmfmnk “It’s not representative of the tidyverse mentality” — Well, more precisely, it’s not representative of the *dplyr* mentality, because that package is specifically designed to handle *tidy data* (i.e. long data). And OP’s problem is specifically a *matrix* computation problem, which dplyr is intentionally not designed to handle. And that’s kinda my point. – Konrad Rudolph Jan 24 '19 at 17:37
3

Here is another approach using matrixStats

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
x[matrixStats::rowMins(y) == y]
# [1] 24 30 20

Or in base using vapply

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
vapply(1:nrow(data), 
       function(k) x[k,][which.min(y[k,])], 
       numeric(1))
# [1] 24 30 20
niko
  • 5,253
  • 1
  • 12
  • 32
2

Define a function closest_to_3 that operates on a vector and returns the value in the vector that's closest to the third member:

closest_to_3 <- function(v) v[-3][which.min(abs( v[-3]-v[3] ))]

(The idiom v[-3] deletes the 3rd member from v.) Then apply this function to each row of your data frame:

apply(data, 1, closest_to_3)
#[1] 24 30 20
grand_chat
  • 431
  • 3
  • 3