1

I am wanting to track and report users in chat rooms, and I'm not sure how best to structure the data in Firebase.

General Situation

  • Users each have a unique user_id
  • The chat rooms are are always opening and closing
  • Each chat room has its own unique room_id
  • The users often enter and leave chat rooms that are open
  • If the chat room is closed users can not enter the room
  • One user might be in more than one chat room at any one time

Getting the data

We have access to the API that returns json, I plan to poll the API every 1min find all the chat rooms (room_id) then request all the users (user_id) for each room.

Setting the data

The setting of the data is totally under our control

Reporting I want to be able to get

  • How many unique users have we seen from x to y date & time
  • Time spent online for 1 user from x to y date & time

Questions

  • Will firebase time stamp each record for me? or do I need to write the time into each record?
  • Is it best using the unix Epoch or a more understandable date time?
  • How should I structure this data in firebase?
Bill
  • 4,614
  • 13
  • 77
  • 132
  • have you read this https://www.firebase.com/docs/web/guide/structuring-data.html ? you can use `Firebase.timestamp` or `priority` by `timestamp`. You can record the visits into separate objects under `room_id` avoid antipatern - nested long lists into other objects. Like storing this info in your room object. The best is to read Firebase documentation more than once. – webduvet Sep 09 '15 at 13:31
  • Thank you for the time stamp pointer I found this page https://www.firebase.com/docs/web/api/servervalue/timestamp.html which gave me this handy snip 'Firebase.ServerValue.TIMESTAMP' – Bill Sep 09 '15 at 13:41
  • I'm not wanting to report against the room_id but i am interested in the user_id and the time, is this why you suggest structuring the data under the room_id ? – Bill Sep 09 '15 at 13:45

1 Answers1

8

Will firebase time stamp each record for me? or do I need to write the time into each record?

Nope, but you can use Firebase.ServerValue.TIMESTAMP as mentioned in the docs. Firebase stores only what you ask it to store.

Is it best using the unix Epoch or a more understandable date time?

Use Firebase.ServerValue.TIMESTAMP (which is a Unix Epoch) for all datetimes (if possible). This ensures consistency and correctness when compared with using new Date().getTime() or any other method which is dependent on the local machine's time (which is often wrong, so you'll end up with messed up data).

Unix Epochs are also integers which work very well with Firebase's querying abilities, specifically we can use .startAt() and .endAt() to fetch things from a specific date range (as we'll see below in the answer).

How should I structure this data in firebase?

The first question you need to ask is "how will I be consuming this data?" Firebase isn't a big SQL database where we can get our structure kind of right then lean on complex querying to make up for our mistakes.

When you build a structure in Firebase, ensure that it allows you to load your data in specifc way. This means that if you know you're going to have a list of room_ids that you'll want to load data from, then your room structure should be based around those IDs.

Consider a structure like this for a simple chat room (we'll use $ notation to indicate wild cards).

{
  "rooms": {
    $room_id: {
      "users": {
        $user_id: true
      },
      "_meta": {
        closed: Boolean
      },
      "messages": {
        $message_id: {
          "user_id": $user_id,
          "text": ""
        }
      }
    }
  },
  "users": {
    $user_id: {...}
  }
}

When a user with an id of abe joins a room with a room_id of room_one, we know that they need to mark themselves as an active member of the chat room by setting the location /rooms/room_one/users/abe to true.

Our function to join a room would look like this.

function joinRoom(room_id) {
  // We assume `ref` is a Firebase reference to the root of our Firebase
  var roomRef = ref.child("rooms").child(room_id);
  roomRef.child("users").child(myUserId).set(true);
  return roomRef;
}

This is being specific. We're given some information and because our data structure is logical we can easily make assumptions about what data needs to be written without loading any data from Firebase.

This isn't good enough for your situation though, since you also want reporting. We'll incrementally improve our structure based on your needs

How many unique users have we seen from x to y date & time

Assuming you're talking on a per-room basis, this is an easy change.

{
  "rooms": {
    $room_id: {
      "users": {
        $user_id: true
      },
      "users_history": {
        $push_id: {
          user_id: ...,
          timestamp: ...
        } 
      },
      "messages": {
        $message_id: {...}
      }
    }
  },
  "users": {
    $user_id: {...}
  }
}

We add the /users/$room_id/users_history location. This is a list of every time a user enters this room. We've added a bit of complexity, so our join room function would look like this.

function joinRoom(room_id) {
  var roomRef = ref.child("rooms").child(room_id);
  roomRef.child("users_history").push({
    user_id: myUserId,
    timestamp: Firebase.ServerValue.TIMESTAMP
  });
  roomRef.child("users").child(myUserId).set(true);
  return roomRef;
}

Now we can easily report how many users have been in a room in a given time using a Firebase Query.

function roomVisitors(room_id, start_datetime, end_datetime) {
  var roomRef = ref.child("rooms").child(room_id),
      queriedRoomRef = roomRef
        .orderByChild('timestamp')
        .startAt(start_datetime.getTime())
        .endAt(end_datetime.getTime());

  // Assuming we use some ES6 promise library
  return new Promise(function (resolve, reject) {
    queriedRoomRef.once("value", function (users) {
      /* Users will be a snapshot of all people who 
         came into the room for the given range of time. */
      resolve(users.val());
    }, function (err) {
      reject(err);;
    });
  });
}

We'll talk about whether or not doing this is truly "specific" in a moment, but this is the general idea.

Time spent online for 1 user from x to y date & time

We haven't fleshed out our /users/$user_id structure yet, but we'll have to do that here. In this situation the only information we'll have to look up a user's time spent online will be their user_id. So we'll have to store this information under /user/$user_id because if we stored it under /rooms/ we would have to load data for all the rooms and loop through it to find relevant user information and that's not very specific.

{
  "rooms": {
    $room_id: {
      "users": {
        $user_id: true
      },
      "users_history": {
        $push_id: {
          user_id: ...,
          timestamp: ...
        } 
      },
      "messages": {
        $message_id: {...}
      }
    }
  },
  "users": {
    $user_id: {
      "online_history": {
        $push_id: {
          "action": "", // "online" or "offline" 
          "timestamp": ... 
        }
      }
    }
  }
}

Now we can build a ref.onAuth(func) that tracks our time online.

var userRef;
ref.onAuth(function (auth) {
  if (!auth && userRef) {
    // If we haven no auth, i.e. we log out, cancel any onDisconnect's
    userRef.onDisconnect().cancel();
    // and push a record saying the user went offline
    userRef.child("online_history").push({
      action: "offline",
      timestamp: Firebase.ServerValue.TIMESTAMP
    });
  } else if (auth) {
    userRef = ref.child('users').child(auth.uid);
    // add a record that we went offline
    userRef.child('online_history').push({
      action: "online",
      timestamp: Firebase.ServerValue.TIMESTAMP
    });
    // and if the user disconnects, add a record of going offline
    userRef.child('online_history').push().onDisconnect().set({
      action: "offline",
      timestamp: Firebase.ServerValue.TIMESTAMP
    });
  }
});

Using this method we can now write a function to loop through the online/offline log and add up time for a given range using the same method of querying used above, but I'll leave this as an exercise for the reader.

Notes about specificity and performance

Neither of the reporting functions are specific. When we're getting a list of users who visited a room in the first query, we're grabbing a big object filled with usernames and pulling all that data down then parsing it client-side, when what we really want is just an integer value of the number of unique visitors.

This is a situation where you really want to employ a NodeJS worker using the server-side SDK. This worker can sit and watch changes to your data structure and automatically summarize data as it changes so your client can then look at a location like /rooms/$room_id/_meta/analytics/uniqueVisitorsThisWeek and simply get a number like 10.

The point is, storage is cheap, summarizing and caching data like this is cheap, but only if it's done server-side. If you're not specific and you load too much and attempt to perform summarizing client side, you'll waste CPU cycles and bandwidth.

If you're ever loading data onto a client from Firebase and not displaying that data, you should be reworking your data structure to be more specific.

Abe Haskins
  • 1,378
  • 1
  • 7
  • 9