I have a spark dataframe that I am broadcasting as Array[Array[String]]. My requirement is to do a range lookup on 2 columns.
Right now I have something like ->
val cols = data.filter(_(0).toLong <= ip).filter(_(1).toLong >= ip).take(1) match {
case Array(t) => t
case _ => Array()
}
The following data file is stored as Array[Array[String]] (except for the header row that I have shown below only as reference.) and passed to the filter function shown above.
sample data file ->
startIPInt | endIPInt | lat | lon
676211200 | 676211455 | 37.33053 | -121.83823
16777216 | 16777342 | -34.9210644736842 | 138.598709868421
17081712 | 17081712 | 0 | 0
sample value to search ->
ip = 676211325
based on the range of the startIPInt and endIPInt values, I want the rest of the mapping rows.
This lookup takes 1-2 sec for each, and I am not even sure the 2nd filter condition is getting executed(in debug mode always it only seems to execute the 1st condition). Can someone suggest me a faster and more reliable lookup here?
Thanks!