match values in both row and columns from another dataframe

Question

    L <- c(0, 500, 1000, 2000, 3000, 5000, 10000, 20000, 50000);
fli.1 <- c(0, 0.1, 0.2, 0.4, 0.8, 0.9, 1, 1.2, 1.8);
fli.2 <- c(0, 0.11, 0.21, 0.42, 0.84, 0.95, 1.05, 1.26, 1.89);
fli.3 <- c(0, 0.11, 0.22, 0.44, 0.88, 0.99, 1.1, 1.32, 1.98);
fli.4 <- c(0, 0.12, 0.23, 0.46, 0.93, 1.04, 1.16, 1.39, 2.08);
fli.5 <- c(0, 0.12, 0.24, 0.49, 0.97, 1.09, 1.22, 1.46, 2.19);
data <- data.frame(L, fli.1, fli.2, fli.3, fli.4, fli.5);


d <- data.frame(quantity = c(300, 368, 568, 20, 1000, 37659, 45000, 2500, 4500, 78453, 1200, 1589), fli = c("fli.1", "fli.1", "fli.4", "fli.5", "fli.2", "fli.2", "fli.5", "fli.1", "fli.2", "fli.2", "fli.3", "fli.4"));

i need to create another column in the dataframe d such that for each of its entry it takes value from the table L. it should select row which is less than the quantity. it should select column based on the fli. for e.g. 37659 it would be 8th row and 2nd column which is 1.26.

I have tried using matrix, but it takes too much time. note that it is sample data i need to apply it to a very large dataset.

Isn't this just a minor revision of your prior question with altered sample data? https://stackoverflow.com/questions/76461482/r-code-for-mapping-vertically-and-horizontally-just-like-offset-does-in-excel/76461810 — Jon Spring, Jun 13 '23 at 05:06

GKi · Answer 1 · 2023-06-13T06:23:23.927

As L is sorted in data you can use findInterval to get the row and match for the column, cbind the indices and use them to subset data.

d$value <-
  data[cbind(findInterval(d$quantity, data$L), match(d$fli, names(data)))]

d
#   quantity   fli value
#1       300 fli.1  0.00
#2       368 fli.1  0.00
#3       568 fli.4  0.12
#4        20 fli.5  0.00
#5      1000 fli.2  0.21
#6     37659 fli.2  1.26
#7     45000 fli.5  1.46
#8      2500 fli.1  0.40
#9      4500 fli.2  0.84
#10    78453 fli.2  1.89
#11     1200 fli.3  0.22
#12     1589 fli.4  0.23

Benchmark

library(dplyr)

bench::mark(check=FALSE,
"Jon Spring" = {d |>
  left_join(tidyr::pivot_longer(data, -L, names_to = "fli"), 
            join_by(fli,  closest(quantity >= L)))},
GKi = cbind(d, value=data[cbind(findInterval(d$quantity, data$L), match(d$fli, names(data)))])
)
#  expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#1 Jon Spring   5.22ms   5.29ms      188.    21.4KB     8.45    89     4
#2 GKi        198.77µs 207.86µs     4742.      480B    10.3   2313     5

In this case GKi is about 25 times faster than Jon Spring and allocates less memory.

score 0 · Answer 2 · answered Jun 13 '23 at 05:11

Modifying from my answer to your prior question to which this seems like a minor revision. (It's best practice to promptly edit your question if you realize it's unclear or not what you meant to ask.)

library(dplyr) # v1.1.0+
d |>
  left_join(pivot_longer(data, -L, names_to = "fli"), 
            join_by(fli,  closest(quantity >= L)))

Result

   quantity   fli     L value
1       300 fli.1     0  0.00
2       368 fli.1     0  0.00
3       568 fli.4   500  0.12
4        20 fli.5     0  0.00
5      1000 fli.2  1000  0.21
6     37659 fli.2 20000  1.26
7     45000 fli.5 20000  1.46
8      2500 fli.1  2000  0.40
9      4500 fli.2  3000  0.84
10    78453 fli.2 50000  1.89
11     1200 fli.3  1000  0.22
12     1589 fli.4  1000  0.23

score 0 · Answer 3 · answered Jun 13 '23 at 08:28

Here, I provide a slightly different approach.

First, I changed the data from wide format to long format, and renamed the columns for merging the tables later.

library(reshape2)

data_DT <- melt(data, id = "L")
names(data_DT) <- c("L", "fli", "value")

Then, I divide the quantity into groups, using breakpoints based on the values of L in data. The groups will be something like (0,500], (500,1000], and so on. Using simple regex matching, I can then obtain the value of the lower bound, this will be used to merge with the first table.

library(data.table)

d_DT <- data.table(d)
d_DT[, quantity_group := cut(quantity, c(data[, "L"], Inf))]
d_DT[, L := as.numeric(gsub("^.", "", gsub(",.*", "", quantity_group)))]

d_DT <- merge(d_DT, data_DT, by = c("L", "fli"))

match values in both row and columns from another dataframe

3 Answers3