2

How extract values of a data.table based on multiple conditions?

I need a function that returns a value of a column data.table based on two other column values:

require(data.table)

dt <- data.table(
    "base" = c("of", "of", "of", "lead and background vocals", "save thou me from", "silent in the face"),
    "prediction" = c("the", "set", "course", "from", "the", "of"),
    "count" = c(258586, 246646, 137533, 4, 4, 4)
)

> dt
#                         base prediction  count
#1:                         of        the 258586
#2:                         of        set 246646
#3:                         of     course 137533
#4: lead and background vocals       from      4
#5:          save thou me from        the      4
#6:         silent in the face         of      4

# the function needs to return the "prediction" value based on the max "count" value for the input "base" value.
# giving the input "of" to function:
> prediction("of")
# the desired output is:
> "the"
# or:
> prediction("save thou me from")
> "the"

1 Answers1

1

We can specify the i, extract the 'prediction' value based on the max value in 'count'

dt[base == 'of', prediction[which.max(count)]]
#[1] "the"
dt[base == 'save thou me from', prediction[which.max(count)]]
#[1] "the"

It can be wrapped into a function

f1 <- function(val) dt[base == val, prediction[which.max(count)]]
f1("of")
#[1] "the"
f1("save thou me from")
#[1] "the"

NOTE: It is better to have dataset identifier, and column names also as arguments

akrun
  • 874,273
  • 37
  • 540
  • 662
  • the solution works for small datasets. How can i fast search inside a very larga data.table (57M obs)? – Danilo Correa Oct 14 '19 at 22:03
  • @DaniloCorrea Try with setting the key `setkey(dt, "base"); dt[.('of'), prediction[which(count == max(count))[1]]]` – akrun Oct 15 '19 at 06:15