1

On my JSP page I have a select object that could contain 60k+ objects. I am storing those 60k+ objects into a javascript Array called "masterList". I have provided the user an input box to filter the list. Filtering is based on a "begins with.." approach. Is there a faster way to do this? I am noticing performance issues when a user inputs zero or 1 character into the input box.

This is how my code looks now.

    var numShown = 0;
    var listLength = masterList.length;     

    for(i = 0; i < listLength; i++){
        if(masterList[i].search(re) != -1){
            selectBox[numShown] = new Option(masterList[i], masterList[i]);
            numShown++;
        }

        // Stop when the number to show is reached and input present
        if(input.value != "" && numShown == maxToShow){
            break;
        }
    }
AsTeR
  • 7,247
  • 14
  • 60
  • 99
Justin
  • 800
  • 2
  • 11
  • 26
  • You could filter it server-side instead. Are you getting the data originally from a database? – mellamokb Jul 13 '12 at 19:45
  • data is coming from a java class. I would like to keep things on client side if possible. – Justin Jul 13 '12 at 19:47
  • how about not filtering if the input is 0 or 1 characters? – Otto Allmendinger Jul 13 '12 at 19:48
  • The usual workaround is to only provide autocompletion on large lists when the user has entered two or more characters. At the very least, try to avoid triggering a search for *zero* characters (full scan, no matter how efficient your code is). – Frédéric Hamidi Jul 13 '12 at 19:48
  • 5
    The easiest trick is to break the list into a tree, with each node representing the first letter of the words it contains (e.g. a: ["aardvark", "avacado"] etc. – Mike Robinson Jul 13 '12 at 19:49
  • @MikeRobinson....excellent idea :) – Jashwant Jul 13 '12 at 19:51
  • On the pure problem of searching in an array : I suggest to implement a dichotomic search into a sorted array, using a dictionary is a point only reducing complexity by the number of letter... Which is good but not as high as it could be. – AsTeR Jul 13 '12 at 19:57
  • You should use Chome's profiler https://developers.google.com/chrome-developer-tools/docs/profiles to identify where you're spending most of your CPU time. – Brian Nickel Jul 13 '12 at 22:08

7 Answers7

2

What is the Option() object that is being created each and every time through the loop? That in itself could be a source of performance problems. Its unlikely you need to be creating new objects all the time, so if you can add that code it would help to optimize your code.

A very basic way to start optimizing the search times would be to create a second object containing the start indexes of each letter in your masterList. Note that if your masterList data changes while the user is working with it, your indexes object would need to be recalculated.

e.g.

var masterList = [aardvark, apple, balloon, blue, cat, dog];
var indexes = {'a':0, 'b':2, 'c':4, 'd':5};

Each time the user types, take the 1st character they've typed in and reference its starting index in the 'indexes' object. Then start your for loop from that index. Also remember that you must stop your for loop when you have not found a match, or extended the search beyond the range of the letter matches. Basically, there's no point in searching all the way to the end of masterList when your initial search criteria are not met.

Other notes: You need to declare i with a var in your for loop. Whether you know it or not, you've done a good thing by caching the length with listLength. Javascript would otherwise be re-calculating masterList.length on each iteration.

Geuis
  • 41,122
  • 56
  • 157
  • 219
  • `new Option(text, value)` is the pre-DOM API for `document.createElement('option')` while the create/destroy is costly, it's likely not the source of the major problem since it's being done a maximum number of times for every search string. – Brian Nickel Jul 13 '12 at 21:43
1

JQuery UI offers you a good autocompletion component, man can consider you have a design flaw if you push the 60k elements to the client.

This component enables you to :

  • Simply autocomplete from an array, if you still want to push all element server side ;
  • Build a more complex autocomplete schema that's using a webservice to provide the elements to display (considering server side caching as suggesting in comments below would be a good idea).
AsTeR
  • 7,247
  • 14
  • 60
  • 99
  • 3
    You need this, plus a minimum length to the filter, plus filtering on the server-side, plus server-side caching layer if possible. – Adrian J. Moreno Jul 13 '12 at 19:54
1

Your primary problem is that you're scanning the whole list every time. This is something that you can avoid,

One approach is to sort the list. By doing this, you can determine when searching can stop scanning, once you get to the end of your matches.

You should also perform a binary search, rather than a scan. This can be done relatively easily on a sorted list.

Failing that, you should create a directory of your objects instead. Something like:

{
   aa: ['aardvark', 'aardwolf' ],
   ab: ['abstain'.....
   ...
}

This reduces the amount of scanning that is truly necessary

Dancrumb
  • 26,597
  • 10
  • 74
  • 130
1

[edit] See this jsfiddle for an example, using an initial array of 100.000 elements.

Aside from the other more or less useful suggestions:

First optimization would be to sort your array. Second, you could prepare the large array by converting its values into ready made Option objects. Third, did you think of applying the new Array map and filter methods? Fourth, use RegEx test in stead of search. With these, you could reduce your filtering code to:

//once, on page load
masterlist = masterlist
             .sort()
             .map(function(item){return new Option(item,item);});

//filtering
var optsfiltered = masterlist.filter(function(item){
     re.lastIndex = 0;
     return re.test(item.value);
    }).slice(0,maxToShow);

//replace selectBox options
selectBox.options.length = 0;
for (var i=0;i<optsfiltered.length;i++){
  selectBox[i] = optsfiltered[i];
}

For browser compatibility, you may want to use shims for map and filter

KooiInc
  • 119,216
  • 31
  • 141
  • 177
  • +1 for extending my knowledge of the use of shims for x-browser compatibility. Ref. [Another SO post](http://stackoverflow.com/a/2790686/652738) Your answer also kept the server out of the answer; as I wanted to perform everything locally. – Justin Jul 16 '12 at 17:45
0

The best option for performance would be to load the data after inputs. That is, when a user types "tes" the server send back a JSON string like ["test", ...].

The next best option would be to break things into trees as @Dancrumb suggested.

If you can't do either of these then the next best approach would be to make sure your array is in order and do a binary search for the starting point. Nicholas C. Zakas has a good example here: http://www.nczonline.net/blog/2009/09/01/computer-science-in-javascript-binary-search/

That'll get you from O(N) to O(lg(N))

You'll have to modify it slightly to search for strings that start with rather than equal your value, then search backwards for the first instance.

Update

Just saw you're using .search. That is means you can't use any of these approaches and are stuck with either an expensive search on the server or an expensive search on the client. Consider checking that it starts with the value instead for a faster search: How to check if a string "StartsWith" another string?

Community
  • 1
  • 1
Brian Nickel
  • 26,890
  • 5
  • 80
  • 110
0

To loop through the array the quickest way your best to do a for loop like...

for(var z =listLength;--z;) { do what ever }

As this saves an evaluation on every loop (e.g. the bit thats a

If i was you i would use a differnt format to hold it in for fast searching like Trie

If you have to use an array at least split them in to an array of a's,b's,c's and so on...

Lemex
  • 3,772
  • 14
  • 53
  • 87
0

Okay, here is the biggest problem you have and why none of the other solutions (including my other one) won't work:

if(masterList[i].search(re) != -1)

You're performing a regular expression search on every entry up until you've populated your list! Not only is this a costly operation, it means that you're probably interested in matches in the middle of words, which means you can't depend on the prefix-based optimization that everyone is proposing.

Here is an approach that will work reasonably well if (A) you have a high hit rate and (B) you really, really want to do search:

  1. When the user types a letter, for example "a", kick off your search. Find all matching items up to your limit, for example 5. Save an entry in a cache. For example:

    myCache['a'] = {
        entries: ['aaaa', 'aaab', 'aaac', 'aaad', 'aaae'],
        lastIndex: 5
    };
    
  2. When the user types the next character, and now the string is "ab", take the string minus one character and you've got "a", look in myCache["a"] and check the values, you've got a match with "aaab" but everything else fails. You still need 4 more.

  3. Start at myCache["a"].lastIndex + 1. Keep searching until you have a match. Store your results in the cache and you've got:

    myCache['ab'] = {
        entries: ['aaab', 'aaba', 'aabb', 'aabc', 'aabd'],
        lastIndex: 29
    };
    
  4. Repeat step 2 and 3 as the string length increases.

Brian Nickel
  • 26,890
  • 5
  • 80
  • 110