1

I am trying to find an efficient method of creating a hash table for a large amount of data, involving two keys and multiple return values.

A sample dataset can be generated as follows:

set.seed(1)
Data <- data.frame(
  X = sample(1:10),
  Y = sample(1:10),
  val1 = sample(1:10),
  val2 = sample(1:10),
  val3 = sample(1:10)
)

I have a large amount of location data (X and Y in the sample) and have multiple values that need to be mapped to each data point. I will need to look up the mapped values millions of times in my code, and ideally I could look up a given (X,Y) pair and have a vector (val1, val2, val3) returned. I'm currently using:

getPixIndex <- function(Data, x, y) {
  return(which(Data$X == x & Data$Y == y))
}

This returns the index which then allows me to access the corresponding val1, val2, and val3 for the (X,Y) pair.

However, I am wondering if this is the most efficient way to perform lookups. I have searched for hash table implementations in R and found environments, but it seems that they require characters as keys, which would mean I'd have to convert all the (X,Y) pairs into characters which doesn't seem efficient. Is there a more efficient way to create this lookup table?

Larry
  • 11
  • 2
  • Look at https://stackoverflow.com/questions/7818970/is-there-a-dictionary-functionality-in-r/44570412#44570412 If I correctly undertsand, the **dict** package should meet your requirements. Unfortunately not available on CRAN. – Stéphane Laurent Jan 29 '19 at 03:29
  • Hi Stephane - thank you, I have looked at it and I'll do timing tests on both my current implementation and the dict one. – Larry Jan 29 '19 at 18:48

0 Answers0