1

PROBLEM: I am trying to speed up my IndexedDB searches by using multiple web workers and therefore executing multiple read transactions simultaneously, but it's not really working, and my CPU only gets to around 30-35% utilization. I have a 4-core processor and was hoping that spawning 4 web workers would dramatically reduce the search time.

I am using Firefox 53 with a WebExtension; other browsers are not an option.

DATABASE: I have a data store with about 250,000 records, each with about 30 keys, some of them containing paragraphs of text.

TASK: Perform a string search on a given key to find matching values. Currently, this takes about 90 seconds to do on a single thread. Adding an additional worker reduces that time to about 75 seconds. More workers than that have no noticeable effect. An acceptable time to me would be under 10 seconds (somewhat comparable to an SQL database).

CURRENT STRATEGY: Spawn a worker for each processor, and create a Promise that resolves when the worker sends a message. In each worker, open the database, divide the records up evenly, and search for the string. I do that by starting on the first record if you're the first worker, second record for the second, etc. Then advance by the number of workers. So the first worker checks records 1, 5, 9, etc. Second worker checks 2, 6, 10, etc. Of course, I could also have the first worker check 1-50, second worker check 51-100, etc. (but obviously thousands each).

Using getAll() on a single thread took almost double the time and 4GB of memory. Splitting that into 4 ranges significantly reduces the time down to a total of about 40 seconds after merging the results (the 40 seconds varies wildly every time I run the script).

Any ideas on how I can make this work, or other suggestions for significantly speeding up the search?

background.js:

var key = whatever, val = something
var proc = navigator.hardwareConcurrency;  // Number of processors
var wPromise = [];  // Array of promises (one for each worker)
var workers = [];

/* Create a worker for each processor */
for (var pos = 0; pos < proc; pos++) {
    workers[pos] = new Worker("js/dbQuery.js");
    wPromise.push(
        new Promise( resolve => workers[pos].onmessage = resolve )
    );
    workers[pos].postMessage({key:key, val:val, pos:pos, proc:proc});
}

return Promise.all(wPromise);  // Do something once all the workers have finished

dbQuery.js:

onmessage = e => {
    var data = e.data;
    var req = indexedDB.open("Blah", 1);
    req.onsuccess = e => {
        var keyArr = [];
        var db = e.currentTarget.result;
        db.transaction("Blah").objectStore("Blah").index(data.key).openKeyCursor().onsuccess = e => {
            var cursor = e.target.result;
            if (cursor) {
                if (data.pos) {
                    cursor.advance(data.pos); // Start searching at a position based on which web worker
                    data.pos = false;
                }
                else {
                    if (cursor.key.includes(data.val)) {
                        keyArr.push(cursor.primaryKey);  // Store key if value is a match
                    }
                    cursor.advance(data.proc); // Advance position based on number of processors
                }
            }
            else {
                db.close();
                postMessage(keyArr);
                close();
            }
        }
    }
}
Daniel H
  • 443
  • 3
  • 14
  • Note, have not tried `IndexedDB`, though curious why `Promise.all()` is necessary? – guest271314 Jun 03 '17 at 22:46
  • I use `Promise.all` because I create a promise for each worker and store them in an array. I only want to use the data once all the workers have returned their results. It's not technically necessary, or really related to the underlying issue of speed. – Daniel H Jun 03 '17 at 22:50
  • You could alternatively utilize `Promise.race()` to return a resolved `Promise` if any of the matches are located, instead of waiting for all `Promise`s to be resolved, see [How to determine which canvas object completed its random speed animation first](https://stackoverflow.com/questions/37289915/how-to-determine-which-canvas-object-completed-its-random-speed-animation-first/37290032#37290032) – guest271314 Jun 03 '17 at 22:51
  • "...have the first worker check 1-50, second worker check 51-100..." - have you timed this approach? I would expect it to be slightly more efficient as the individual cursors are processing sequential data. – Joshua Bell Jun 05 '17 at 16:59
  • I'm not familiar with the FF implementation of IDB, but it's likely that individual worker threads are submitting requests to a single database thread/process that's executing the individual database operations for all transactions. What the workers buy you is distributing the "overhead" - generating events, executing the event loop, deserializing records into JS values, executing your filtering logic, etc. This matches the behavior you're observing - you've reduced the overhead but are bottoming out in the cost of the underlying operations. – Joshua Bell Jun 05 '17 at 17:05
  • Another option is to use `getAll()`/`getAllKeys()` - supported since FF51 - to avoid the script/event iteration overhead of cursors. (You'll still need to batch the requests to avoid hitting memory limits.) This is slightly trickier in your case since you need both the index key and primary key. – Joshua Bell Jun 05 '17 at 17:09
  • @JoshuaBell Yes, I tried the other approach I mentioned of reading sequentially, but it didn't do much to improve the performance. The biggest issue is the cursor moving through the records, which is why I thought distributing that task among 4 workers would dramatically reduce the time. I don't understand why opening 4 instances of the database wouldn't create 4 separate threads/transactions. – Daniel H Jun 06 '17 at 01:53
  • @JoshuaBell I just added the times for using `getAll()` to the question. I think I've decided to use that method to load the entire database into memory (~1.5 GB), and run all future searches from that. By front-loading the 40 seconds of reading I can get any future results in less than a second. This strategy basically turns IDB into localStorage since I'm not using it for searching anymore. – Daniel H Jun 06 '17 at 02:00
  • Did you try to use "nextunique" cursor direction and the retrieval of primary keys in extra idb requests? – 4esn0k Mar 09 '18 at 10:32

1 Answers1

0

Any ideas on how I can make this work, or other suggestions for significantly speeding up the search?

You can substitute using Promise.race() for Promise.all() to return a resolved Promise once a match is found, instead of waiting for all of the Promises passed to Promise.all() to be resolved.

guest271314
  • 1
  • 15
  • 104
  • 177
  • I'm not really sure how that makes the search run any faster. The main issue is how long it takes to get the final match, which is a function of the number of values I must check for a given key. – Daniel H Jun 03 '17 at 22:55
  • @DanielH See _"or other suggestions for significantly speeding up the search"_ Try `Promise.race()` and note the benchmark differences, if any. Currently, `javascript` at Question appears to await the fulfillment of all `Promise` objects passed to `Promise.all()` instead of calling `.then()` when any of the `Promise` values have resolved. Outside of that adjustment, if the search algorithm is currently performing the necessary task, not sure how the search could be improved if you have concluded that creating more `WebWorker` instances does not decrease total time of procedure? – guest271314 Jun 03 '17 at 22:57