1

I am looking for best practices when it comes to adding many items to an indexedDB. The upper bound of the number of items is usually around 100k, and each item is a js object usually around 350 bytes.

I maintain a list of pending items (actions) to be taken on the database (each action is either "add", "put", or "delete"), and each item could belong to a different object store.

I simply take the first item, apply it to the appropriate objectStore, and once that's done (either successfully or not), I move on to the next item.

Is there a better (more efficient) way to do this? And are there any scoping issues w.r.t. transaction lifetime I need to be concerned with in the snippet below?

function flushPendingDbActions() {
    var transaction = db.transaction(storeNames, "readwrite");

    var addNext = function() {
        var nextItem = pendingDbActions[0];
        if (nextItem) {
            pendingDbActions.shift();

            var objectStore = transaction.objectStore(nextItem.store),
                params,
                request;

            switch(nextItem.action) {
            case 'add':
            case 'put':
                params = nextItem;
                break;
            case 'delete':
                params = nextItem.key;
                break;
            }

            request = objectStore[nextItem.action](params);
            request.onsuccess = request.onerror = addNext;
        }
    };

    addNext();
}
ricklane
  • 173
  • 1
  • 11

1 Answers1

1

Looks alright to me, but a couple things:

  • I'd stop forward iteration on error. An error event sometimes indicates something is really wrong, and that all later requests on that transaction will probably also fail, and by continuing on the error path you are just going to cause N more errors, where N is the remaining number.
  • I'd use a stack or lifo access pattern. Append items onto an array, then pop them off to advance. Avoid using shift and fifo-queue like access, it is just bad perf in js because shift does a ton more work than pop. This is nitpicking of course but you kind of asked about efficiency. Kinda only matters if dealing with a large N.
  • Depending on the number of items, you can run into a stack overflow kind of error, because each request is a pending function on the so-called stack, which from your code looks like it accumulates unbounded. In this case you may want to batch requests into chunks of like 100 or 1000 or something to avoid that issue ever arising, but this only matters if you are even getting above the threshold number, and the particular platform where the code is running (e.g. on mobile might run into trouble).
  • Then there is a more subtle async issue. Right now you are waiting on each request to complete before issuing the next request. Why? If the requests are logically independent, fire them all off immediately, and wait on the transaction to complete (which completes whenever the last pending request completes). You generally only need to use the pattern you're using now, where each action waits for all prior actions to settle, if each subsequent request's logic changes based on the prior requests. So you're introducing idleness where none is needed, and basically speed-limiting yourself for no reason, again, provided that the requests are independent.

Edit - added content:

According to https://www.w3.org/TR/IndexedDB-2/#transaction-lifetime-concept:

Unless otherwise defined, requests must be executed in the order in which they were made against the transaction. Likewise, their results must be returned in the order the requests were placed against a specific transaction.

Therefore, you should be able to fire off your requests concurrently instead of serially, which should avoid the TransactionInactiveError. This is true even if you fire off requests that do the equivalent of add item > delete item.

Then you run into the stack overflow issue with too many pending requests because requests are located in the stack, and then you should consider buffering in that case. So take the pendingactions array and process it in chunks. Take let's say 1000 at a time, pass 1000 to a processchunk helper function that does fires off 1000 concurrent requests, wait for it to complete, and then process the next chunk, up until all chunks processed.

Josh
  • 17,834
  • 7
  • 50
  • 68
  • The reason I'm following this pattern is this SO article from a few years back: https://stackoverflow.com/questions/10471759/inserting-large-quantities-in-indexeddbs-objectstore-blocks-ui, and the reason I'm using a queue vs a stack is because order matters. It's possible to have an Add followed by a Delete, so I need to ensure they are applied in that order. Of course, I could use a different data structure for my pending list (i.e. a hash indexed off "key" instead of an array). Is there any concern around a maximum number of open requests? – ricklane Nov 21 '17 at 14:57
  • Also, the reason I asked the question in the first place is because I occasionally get a TransactionInactiveError exception in this loop, and it's not clear to me why. It made me wonder if perhaps I needed to move the db.transaction() creation to the inner function (i.e. create the transaction every time addNext() is called) – ricklane Nov 21 '17 at 14:59
  • You should be able to use the same transaction provided that each request involves no additional async code (e.g. no intermediate call to XMLHttpRequest or setTimeout or something like that). An inactive error indicates to me you are doing something funky somewhere that is not a straightforward process in the same event loop epoch, or that you really are trying to process too many items at once (you're reaching the "practical" maximum number of open requests on the platform where you are testing). – Josh Nov 21 '17 at 15:03
  • You should be able to use a LIFO access pattern, in other words use pop instead of shift, you just need to append items to the stack in reverse. shift is really truly terrible horrific perf. – Josh Nov 21 '17 at 15:05
  • Could you expand on that last point? You can see from the code above (which is the actual code in use) that I'm not doing anything other than add/put/delete operations and handling the onsuccess/onerror callbacks (i.e. no opportunity to return to the event loop). But is there a possibility that it's simply too many items? Is this what you meant by something similar to a stack-overflow exception? – ricklane Nov 21 '17 at 15:10
  • The question you linked to about inserting large number is what I was talking about in my 3rd point. Same concept. Oh we commented at same time, yes, regarding your last question, that is what I am referring to, specifically that you may just be doing too many items. This is tricky because it is also affected by request type, and payload (e.g. are you writing large blobs or basic objects with a few ints and strings). So you might get a clearer answer if you specific the number of items you are working with on average. – Josh Nov 21 '17 at 15:10
  • So are you suggesting I use splice+pop? I assumed splice was similar complexity to shift. – ricklane Nov 21 '17 at 15:13
  • Are the items you are going to process known before the start of the call to flushPendingDbActions? If so, basically just make one call to pendingDbActions.reverse(), then use pop within flushPendingDbActions. This will still be faster than using shift. – Josh Nov 21 '17 at 15:15
  • I don't mean to belabor the point, but I just want to make sure I understand. You said it's possible I'm "reaching the 'practical' maximum number of open requests on the platform where you are testing" -- but with the pattern I've adopted, don't I only have one open request at any given time? – ricklane Nov 21 '17 at 15:15
  • You know what, you're right, my bad, I keep thinking of how I personally experience this problem. You're delaying your next request until the end of the previous request, so you're absolutely right. So the problem is that at some point a request takes too long, and the tx watchdog finds no active requests, timesout the tx, the tx becomes inactive, and the next request triggers the tx inactive error. – Josh Nov 21 '17 at 15:17
  • Got it. Thanks. For now it sounds like my best bet is to just do some more aggressive exception handling and handle these scenarios more gracefully (and use a more efficient data structure as well). – ricklane Nov 21 '17 at 15:19
  • Well, I'd still try the fire-and-forget approach where you process requests concurrently instead of waiting for each one to complete before firing off the next (e.g. process requests serially). That is kind of how the api was intended to be used. Then, if N is large, you might run into the too many items issue, but you also might not. I'd try that. – Josh Nov 21 '17 at 15:20
  • Is there really such a thing as fire-and-forget, though, if I need to manage some maximum concurrent request count? In other words, I think I need to wait on the first batch of items to resolve before I can move on to the next batch, lest I risk exceeding some limit. – ricklane Nov 21 '17 at 15:25
  • I wouldn't worry about maximum concurrent request count until (1) you clarify in your question you are dealing with a large number of items, (2) you actually run into the error. Lately the platforms are pretty snappy even for large N. – Josh Nov 21 '17 at 15:26
  • I've updated the question to specify - upper bound is usually 100k items, and the items are pretty small - 350 bytes or so. – ricklane Nov 21 '17 at 15:32
  • Oh. That is huge N. Yes, you are definitely in the range of running into the too many items issue if you switch to the concurrent approach, and would want to use the buffered technique mentioned in the question you linked to in comment, if you switch to that. Also, you again probably don't want to continue for onerror, if you look closely at the answer to the linked question, it does not continue in that case. – Josh Nov 21 '17 at 15:34
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/159508/discussion-between-ricklane-and-josh). – ricklane Nov 21 '17 at 16:45