- You can use binary search with the first letter of the URL. This will come handy because whitelists can grow pretty fast. However you cannot do this with patterns. (e.g.: *.somedomain.com)
- Consider about using a hashtable to store the whitelist. You can make it efficient and specialized by writing your own hash function.
- Regex will make things easier, but will also make things slow at times. If you use regex, make sure you know what you are doing. You can shrink the comparison list first by one of the methods described above.
Edit: Here's the binary search I was talking about. This is applicable only if wildcards are not used.
function binarySearch(needle, haystack, startIndex, endIndex) {
//console.log("\ttrying to find " + needle + " between " +
// haystack[startIndex] + "(" + startIndex + ") and " +
// haystack[endIndex] + "(" + endIndex + ")");
// the basic case, where the list is narrowed down to 1 or 2 items
if (startIndex == endIndex || endIndex - startIndex == 1) {
if (haystack[startIndex] == needle)
return startIndex;
if (haystack[endIndex] == needle)
return endIndex;
return -1;
}
var midIndex = Math.ceil((startIndex + endIndex) / 2);
//console.log("\t\tgot " + haystack[midIndex] + "(" + midIndex +
// ") for middle of the list.");
var comparison = haystack[midIndex].localeCompare(needle);
//console.log("\t\tcomparison: " + comparison);
if (comparison > 0)
return binarySearch(needle, haystack, startIndex, midIndex);
if (comparison < 0)
return binarySearch(needle, haystack, midIndex, endIndex);
return midIndex; // (comparison == 0)
}
var sitelist = [ // the whitelist (the haystack).
"alpha.com",
"bravo.com",
"charlie.com",
"delta.com",
"echo.com",
"foxtrot.com",
"golf.com",
"hotel.com",
"india.com",
"juliet.com",
"kilo.com",
"lima.com",
"mike.com",
"november.com",
"oscar.com",
"papa.com",
"quebec.com",
"romeo.com",
"sierra.com",
"tango.com",
"uniform.com",
"victor.com",
"whiskey.com",
"xray.com",
"yankee.com",
"zulu.com"
];
function testBinarySearch(needle) {
console.log("trying to find " + needle);
var foundIndex = binarySearch(needle, sitelist, 0, sitelist.length - 1);
if (foundIndex < 0)
console.log(needle + " not found");
else
console.log(needle + " found at: " + foundIndex);
}
// note that the list is already sorted. if the list is not sorted,
// haystack.sort();
// we can find "uniform.com" using 5 comparisons, instead of 20
testBinarySearch("uniform.com");
// we can confirm the non-existance of "google.com" in 4 comparisons, not 26
testBinarySearch("google.com");
// this is an interesting (worst) case, it takes 5 comparisons, instead of 1
testBinarySearch("alpha.com");
// "zulu.com" takes 4 comparisons instead of 26
testBinarySearch("zulu.com");
When your list grows, binary search can scale very well. I would not go to other pros and cons of binary search since they are very well documented in large number of places.
More SO questions on JavaScript binary search: