1

I'm recently developing an app on Fitbit. I am considering MongoDB or HBase as it has support for aggregation and support for processing data in Key value format. Example dataset:

{
    "activities-heart": [
        {
            "customHeartRateZones": [],
            "dateTime": "today",
            "heartRateZones": [
                {
                    "caloriesOut": 2.3246,
                    "max": 94,
                    "min": 30,
                    "minutes": 2,
                    "name": "Out of Range"
                },
                {
                    "caloriesOut": 0,
                    "max": 132,
                    "min": 94,
                    "minutes": 0,
                    "name": "Fat Burn"
                },
                {
                    "caloriesOut": 0,
                    "max": 160,
                    "min": 132,
                    "minutes": 0,
                    "name": "Cardio"
                },
                {
                    "caloriesOut": 0,
                    "max": 220,
                    "min": 160,
                    "minutes": 0,
                    "name": "Peak"
                }
            ],
            "value": "64.2"
        }
    ],
    "activities-heart-intraday": {
        "dataset": [
            {
                "time": "00:00:00",
                "value": 64
            },
            {
                "time": "00:00:10",
                "value": 63
            },
            {
                "time": "00:00:20",
                "value": 64
            },
            {
                "time": "00:00:30",
                "value": 65
            },
            {
                "time": "00:00:45",
                "value": 65
            }
        ],
        "datasetInterval": 1,
        "datasetType": "second"
    }
}

What would be an ideal choice to database to store the Sensor data as I wish to do analysis on this data in my application? Thanks!

Community
  • 1
  • 1
Nicole
  • 859
  • 2
  • 8
  • 16

3 Answers3

3

One thing to worry about with Mongo: The overhead of storing data is massive. In a typical RDBMS or time-series DB, it only stores your data, not the metadata (field names and types) with every row.

You should look into Time Series databases like Graphite and InfluxDB. Even Cassandra has some features for this.

On the other hand, as the other poster pointed out, it may be simpler to start with a regular SQL database, and only migrate when you need to. By deferring the choice, you will have a better understanding of the specific trade-offs you want.

One easy DB to get started with is Graphite. It makes a very specific trade-off: The data storage requirement for each graph is constant (i.e. does not get bigger over time, even if you log years of data.) It can also deal with millions of metrics per second. The only downside is that the resolution "ages out" so you can tell it to store 1m resolution for a few days, but then decrease to 10m resolution for a month, then 1h resolution for 1 year, and 1d resolution for 10 years. You can tell it to keep stats (max, min, average, 90th percentile) for each interval. Getting the graph of any arbitrary time span is basically a single disk seek. There are excellent dashboards to view your data (I recommend Grafana).

jonatan
  • 9,011
  • 2
  • 30
  • 34
BraveNewCurrency
  • 12,654
  • 2
  • 42
  • 50
0

NoSQL DBs are a good choice when you have no structure in your data. You can emulate (key, value) functionality in RDBMSes too. The sample data you have shown looks like can be easily normalized and stored in MySQL or SQL Server. Why don't you go for that first? It will be very easily manageable too. Most importantly, you data has a structure.

If performance becomes a problem, you have indexes at your disposal. Even de-normalization. You can find the steps to normalize your data here in this SO answer about Normalization in databases. You can perform aggregation and do as well processing of your data in RDBMSes as you can in any NoSQL solution. Do you have any other reason?

Community
  • 1
  • 1
displayName
  • 13,888
  • 8
  • 60
  • 75
  • The data from the sensors will be large and will be collected at regular intervals. Hence, I was considering a NoSQL db. – Nicole Sep 21 '15 at 06:08
  • @Nielet: Large is not quantifiable to help anyone tell you whether you should go for RDBMS or NoSQL. I know that RDBMS tables handle order of millions of rows without a trouble. I don't know about billions because I didn't encounter that case yet. Don't worry about data coming from sensors. Unless you have some specific fail with RDBMSes, go with them only. – displayName Sep 21 '15 at 13:49
0

You can try out Amazon Redshift because,

  • It has direct json load capabilities using copy commands.
  • It supports full ANSI SQL (as it is based on PostgreSQL).
  • It has analytic functions buit inside.
  • It has support for Python and R if you want to even more "analytics".
  • It has direct connectivity with most popular reporting solutions (Microstrategy, Tableau etc.)
  • Its completely on the AWS cloud.
Paladin
  • 570
  • 3
  • 13