1

We are offering a service that people will embed on their web site and we are hoping to use Firebase as our backend. We would like to base our subscription rates on page views or something similar. Right now we are stumped trying to figure out how to prevent customers from caching our client js code and omitting any portions that attempt to increment a page views counter.

What we need to do somehow is create a security rule that atomically prevents someone from reading from one location unless they have incremented the counter at another location. Any ideas on how to do this?

For example, assuming the following schema:

{
  "comments" : {
    "-JYlV8KQGkUk18-nnyHk" : {
      "content" : "This is the first comment."
    },
    "-JYlV8KWNlFZHLbOphFO" : {
      "content" : "This is a reply to the first.",
      "replyToCommentId" : "-JYlV8KQGkUk18-nnyHk"
    },
    "-JYlV8KbT63wL9Sb0QvT" : {
      "content" : "This is a reply to the second.",
      "replyToCommentId" : "-JYlV8KWNlFZHLbOphFO"
    },
    "-JYlV8KelTmBr7uRK08y" : {
      "content" : "This is another reply to the first.",
      "replyToCommentId" : "-JYlV8KQGkUk18-nnyHk"
    }
  },
  oldPageViews: 32498,
  pageViews: 32498
}

What would be a way of only allowing read access to the comments if the client first incremented the pageViews field? At first I was thinking about having two fields (something like pageViews and oldPageViews) and starting out by incrementing pageViews, reading the comments, then incrementing oldPageViews to match, and only allowing read on comments if pageViews === oldPageViews + 1. However, unless this could be done atomically, the data could get into a corrupt state if the client started the process but didn't finish it.

Here is a codepen trying to test this idea out.

cayblood
  • 1,838
  • 1
  • 24
  • 44
  • This looks like an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378). What are the other constraints? Example: Presumably there is a static asset users will download? Can you not simply track page impressions using that static asset? The number of comments entered doesn't exactly represent page views. – Kato Oct 27 '14 at 17:57
  • @Kato any downloaded static asset that could be used to track impressions could be cached to prevent ongoing impressions from being tracked. What I ultimately care about is that some counter be incremented every time a firebase resource is connected to. I'm not really concerned about how many comments are accessed per impression. – cayblood Oct 27 '14 at 18:19
  • How about adding a .read rule that checks a timestamp? For example, you could force the user to write a timestamp within 5 minutes of `now` before they can read any data? If that sounds reasonable I'll add details. – Kato Oct 27 '14 at 18:23
  • @Kato that sounds very similar to Justin's answer below. Is that what you were thinking of? – cayblood Oct 27 '14 at 18:41
  • I don't think the page count is necessary and probably overly complicates things. Also, Justin's answer would require you to update the timestamp twice per second, which seems like overkill. If they seem similar enough to you then you can probably work out the details. – Kato Oct 27 '14 at 18:50
  • @Kato how else do I keep track of how many times my plugin has been loaded than with a counter? If you have a simpler solution and time to explain it, I'd love to see it. Also, with Justin's answer below, will I need to keep incrementing the counter every 500ms or will the check only be performed when I first connect to the resource? – cayblood Oct 27 '14 at 19:32
  • Well, some difficulties I see here are a) enforcing that the counter is updated and b) translating a heartbeat every second into "visits" and c) quantifying those visits in a legally sound way and d) any event listeners will be canceled if an update fails since there is only a 500ms window to update the timestamp. – Kato Oct 27 '14 at 19:44

2 Answers2

3

I would suggest a variation of Kato's rate limiting answer : https://stackoverflow.com/a/24841859/75644

Data:

{
  "comments": {
    "-JYlV8KQGkUk18-nnyHk": {
      "content": "This is the first comment."
    },
    "-JYlV8KWNlFZHLbOphFO": {
      "content": "This is a reply to the first.",
      "replyToCommentId": "-JYlV8KQGkUk18-nnyHk"
    },
    "-JYlV8KbT63wL9Sb0QvT": {
      "content": "This is a reply to the second.",
      "replyToCommentId": "-JYlV8KWNlFZHLbOphFO"
    },
    "-JYlV8KelTmBr7uRK08y": {
      "content": "This is another reply to the first.",
      "replyToCommentId": "-JYlV8KQGkUk18-nnyHk"
    },
    "timestamp" : 1413555509137
  },
  "pageViews" : {
    "count" : 345030,
    "lastTs" : 1413555509137
  }
}

Security Rules:

{
  "rules": {
    "pageViews": {
      ".validate": "newData.hasChildren(['count','lastTs'])",
      "count": {
        ".validate": "newData.exists() && newData.isNumber() && newData.val() > data.val()"
      },
      "lastTs": {
        // timestamp can't be deleted or I could just recreate it to bypass our throttle
        ".write": "newData.exists()",
          // the new value must be at least 500 milliseconds after the last (no more than one message every five seconds)
          // the new value must be before now (it will be since `now` is when it reaches the server unless I try to cheat)
        ".validate": "newData.isNumber() && newData.val() === now && (!data.exists() || newData.val() > data.val()+500)"
      }
    },
    "comments": {
      // The comments can't be read unless the pageViews lastTs value is within 500 milliseconds of now
      ".read": "root.child('pageViews').child('lastTs').val() > now - 501",
      ".write": true
    }
  }
}

NOTE : I haven't tested this so you need to play around with it a bit to see if it works.

Also, based on your sample data, I didn't deal with uid's. You need to make sure you're managing who can read/write here.

Community
  • 1
  • 1
Justin Noel
  • 5,945
  • 10
  • 44
  • 59
  • Great adaptation, Justin! One thing that seems to be missing: I don't see any rule that forces pageViews/count to be updated on each iteration. – Kato Oct 27 '14 at 19:36
  • @Kato is the concern that a user could undercount visits as long as they incremented the count at least once per half-second? That may be difficult enough to pull off that we don't care as much about it, but if you have a solution, even better. – cayblood Oct 27 '14 at 19:56
  • I don't think there is anything in the security rules that forces `count` to update, just `lastTs`. – Kato Oct 27 '14 at 20:07
  • Thanks. Without testing, I might have missed it. However, I THINK this does it : `newData.val() > data.val()`. They can't update lastTs without updating count. Maybe you need to also throw in newData.hasChildren to force both count and lastTs – Justin Noel Oct 27 '14 at 22:05
  • Added `newData.hasChildren(['count','lastTs'])` to ensure user updates both fields. Then, the `newData.val() > data.val()` validation rule in counts will force them to increment. – Justin Noel Oct 27 '14 at 22:08
1

Justin's adaptation to the throttling code seems like a great starting point. There are a few annoying loopholes left, like forcing the counter to be updated, getting quantifiable metrics/analytics out of your counter (which requires hooking into a stats tool by some means and will be necessary for accurate billing reports and customer inquiries), and also being able to accurately determine when a visit "ends."

Building from Justin's initial ideas, I think a lot of this overhead can be omitted by simplifying the amount the client is responsible for. Maybe something like:

  1. Only force the user to update a timestamp counter
  2. Employ a node.js script to watch for updates to the counter
  3. Let the node.js script "store" the audit data, preferably by sending it to analytics tools like keen.io, intercom.io, etc.

Starting from this base, I'd adapt the security rules and structure as follows:

{
  "rules": {
    "count": {
      // updated only from node.js script
      // assumes our node worker authenticates with a special uid we created 
      // http://jsfiddle.net/firebase/XDXu5/embedded/result/
      ".write": "auth.uid === 'ADMIN_WORKER'", 
      ".validate": "newData.exists() && newData.isNumber() && newData.val() > data.val()"
    },
    "lastTs": {
      // timestamp can't be deleted or I could just recreate it to bypass our throttle
      ".write": "newData.exists()",
        // the new value must be equal to now (i.e. Firebase.ServerValue.TIMESTAMP)
      ".validate": "newData.isNumber() && newData.val() === now"
    },
    "comments": {
      // The comments can't be read unless the pageViews lastTs value is within 30 seconds
      ".read": "root.child('pageViews').child('lastTs').val() > now - 30000",
      "$comment": {
        ".write": "???"     
      }
    }
  }
}

Now I would write a simple node script to perform the count and administrative tasks:

var Firebase = require('firebase');
var ref = new Firebase(URL);
ref.child('lastTs').on('value', heartbeatReceived);
var lastCheck = null;

function heartbeatReceived(snap) {
  if( isNewSession(snap.val()) ) {
    incrementCounter();
  }
  updateStatsEngine(snap);
}

function incrementCounter() {
  ref.child('count').transaction(function(currVal) {
    return (currVal||0) + 1;
  });
}

function isNewSession(timestamp) {
  // the criteria here is pretty arbitrary and up to you, maybe
  // something like < 30 minutes since last update or the same day?
  var res = lastCheck === null || timestamp - lastCheck > 30 * 60 * 1000;
  lastCheck = timestamp;
  return res;
}

function updateStatsEngine(snap) {
  // contact keen.io via their REST API
  // tell intercom.io that we have an event
  // do whatever is desired to store quantifiable stats
  // and track billing info
  //
  //var client = require('keen.io').configure({
  //    projectId: "<project_id>",
  //    writeKey: "<write_key>",
  //    readKey: "<read_key>",
  //    masterKey: "<master_key>"
  //});
  //
  //client.addEvent("collection", {/* data */});
}

The downside of this approach is that if my admin script goes down, any events during that time are not logged. However, the wonderful thing about this script is its simplicity.

It's not going to have many bugs. Add monit, upstart, or another tool to make sure it stays up and does not crash. Job done.

It's also highly versatile. I can run it on my laptop or even my Android phone (as an HTML page) in a pinch.

Kato
  • 40,352
  • 6
  • 119
  • 149
  • Thanks @Kato. With this solution, is it possible to have heartbeats occurring so quickly that the admin script wouldn't get them all? – cayblood Oct 27 '14 at 20:14
  • One other question @Kato. We would like people to continue to be able to read from the comments location as long as their session is open, which in many cases will be longer than 30s. For this to work, would we need to have the client keep updating the timestamp every 30s? – cayblood Oct 27 '14 at 20:22
  • It should not be possible for them to be missed. That's the point of real-time scalable data sync. They also don't need to be particularly frequent, just some interval less than every 30 seconds according to this example (so that the .read permission does not expire). Adjust according to your needs; those intervals are entirely arbitrary (30 mins seems like a great check interval) – Kato Oct 27 '14 at 20:33