0

I have a spark dataframe that I am broadcasting as Array[Array[String]]. My requirement is to do a range lookup on 2 columns.

Right now I have something like ->

val cols = data.filter(_(0).toLong <= ip).filter(_(1).toLong >= ip).take(1) match {
    case Array(t) => t
    case _ => Array()
  }

The following data file is stored as Array[Array[String]] (except for the header row that I have shown below only as reference.) and passed to the filter function shown above.

sample data file ->

startIPInt | endIPInt  | lat               | lon
676211200  | 676211455 | 37.33053          | -121.83823
16777216   | 16777342  | -34.9210644736842 | 138.598709868421
17081712   | 17081712  | 0                 | 0

sample value to search ->

ip = 676211325

based on the range of the startIPInt and endIPInt values, I want the rest of the mapping rows.

This lookup takes 1-2 sec for each, and I am not even sure the 2nd filter condition is getting executed(in debug mode always it only seems to execute the 1st condition). Can someone suggest me a faster and more reliable lookup here?

Thanks!

user3868051
  • 1,147
  • 2
  • 22
  • 43

0 Answers0