0

Background

I have a Node and React based application. I'm using Firebase for my storage and database. In my application users can fill out a form where they upload an image and select a time for the image to be added to their website. I save each image update as an object in my Firebase database like so. Images are arranged in order of ascending update time.

user-name: {
   images: [
     {
        src: 'image-src-url',
        updateTime: 1503953587727
     }
     {
        src: 'image-src-url',
        updateTime: 1503958424838
     }
   ]
}

Scale

My applications db could potentially get very large with a lot of users and images. I'd like to ensure scalability.

Issue

How do I check when a specific image objects time has been met then execute a function? (I do not need assistance on the actual function that is being run just the checking of the db for a specific time.)

Attempts

I've thought about doing a cron job using node-cron that checks the entire database every 60s (users can only specify the minute the image will update, not the seconds.) Then if it finds a matching updateTime and executes my function. My concern is at a large scale that cron job will take a while to search the db and potentially miss a time.

I've also thought about when the user schedules a new update then dynamically create a specific cron job for that time. I'm unsure how to accomplish this.

Any other methods that may work? Are my concerns about node-cron not valid?

KENdi
  • 7,576
  • 2
  • 16
  • 31
Celso
  • 575
  • 1
  • 6
  • 17

1 Answers1

1

There are two approaches I can think of:

  1. Keep track of the last timestamp you processed
  2. Keep the "things to process" in a queue

Keep track of the last timestamp you processed

When you process items, you use the current timestamp as the cut-off point for your query. Something like:

var now = Date.now();
var query = ref.orderByChild("updateTime").endAt(now)

Now make sure to store this now somewhere (i.e. in your database) so that you can re-use it next time to retrieve the next batch of items:

var previous = ... previous value of now
var now = Date.now();
var query = ref.orderByChild("updateTime").startAt(previous).endAt(now);

With this you're only processing a single slice at a time. The only tricky bit is that somebody might insert a new node with an updateTime that you've already processed. If this is a concern for your use-case, you can prevent them from doing so with a validation rule on updateTime:

".validate": "newData.val() >= root.child('lastProcessed').val()"

As you add more items to the database, you will indeed be querying more items. So there is a scalability limit to this approach, but this approach should work well for anything up to a few hundreds of thousands of nodes (I haven't tested in a while so ymmv).

For a few previous questions on list size:

Keep the "things to process" in a queue

An alternative approach is to keep a queue of items that still need to be processed. So the clients add the items that they want processed to the queue with an updateTime of when they want to processed. And your server picks the items from the queue, performs the necessary updates, and removes the item from the queue:

var now = Date.now();
var query = ref.orderByChild("updateTime").endAt(now)
query.once("value").then(function(snapshot) {
  snapshot.forEach(function(child) {

    // TODO: process the child node

    // remove the child node from the queue
    child.ref.remove();
  });
})

The difference with the earlier approach is that a queue's stable state is going to be empty (or at least quite small), so your queries will run against a much smaller list. That's also why you won't need to keep track of the last timestamp you processed: any item in the queue up to now is eligible for processing.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • Frank great ideas here. I'm leaning towards the second option. Do you think the queue should be separate from my regular database list. I.E Currently I do have a 'queue' but its stored under each individuals user ID. Are you suggesting I also add the images to a new node named 'queue'? Or can I keep my structure and do the same queries you are suggesting? – Celso Sep 03 '17 at 11:13
  • Option 2 is indeed a separate node for the queue, where you delete the items after you've processed them, – Frank van Puffelen Sep 03 '17 at 12:48
  • Option #2 is the option I'm going with! It makes my db writes a bit more complicated but is helpful for these quick reads. – Celso Sep 19 '17 at 14:43
  • Frank - Given that I'm a new app and havn't launched yet, I'm considering migrating my db to Cloud Firestore. A big factor in that decision is the same ability to quickly query and process images whose update time has been met. Given my question a few months ago, do you recommend I do the migration? Would your approach #2 translate to having a separate collection for the queue? – Celso Oct 19 '17 at 04:40
  • While Cloud Firestore offers better scalability and significantly simplifies querying over multiple properties, I don't think it offers anything significantly different for this scenario. You'd indeed make a separate collection for the queue. – Frank van Puffelen Oct 19 '17 at 13:50