1

I am managing my PC time-series data in MongoDB in the format as :

  • I am creating a document for each server for an hour.
  • Then i am trying to store cpuMetric data for each minute in the format as shown below.

The problem is that i don't know how to use the data inside cpuMetric in aggregation.

Being more specific i want to get the last 10 minutes data from this document.


{ "_id" : "192.168.xxx.xxx1440yyy000", 
      "time" : ISODate("2015-yy-xxT05:30:00Z"),
      "ip" : "192.168.xxx.xxx", 
      "serverId" : "abc", 
      "cpuMetric" : { 
        "0" : { "usage" : 25.99, "process" : 123, "cores" : 4, "speed" : 2394, "uptime" : 45839 }, 
        "1" : { "speed" : 2394, "uptime" : 45899, "usage" : 26.003333333333334, "process" : 121, "cores" : 4 }, 
        .
        .
        .
        "58" : { "usage" : 26.093333333333334, "process" : 119, "cores" : 4, "speed" : 2394, "uptime" : 45959 }, 
        "59" : { "usage" : 26.73, "process" : 119, "cores" : 4, "speed" : 2394, "uptime" : 46019 }
      }
    }

Thanks in advance!!

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Abinash Kumar
  • 331
  • 1
  • 2
  • 15
  • You want to "get the last ten minutes" how? Add them up? Average them? Just list them out? Please explain. – Blakes Seven Aug 26 '15 at 07:56
  • I am sorry i didnt clear the structure. cpuMetric.0 means 0th minute data similarly cpuMetric.59 means 59th minute data. Now i want to just list the cpuMetric[49:59]. – Abinash Kumar Aug 26 '15 at 07:59
  • I understand how to read it. I am asking "you" to tell "us" how you want to "use the data". All you say is "get the last ten minutes". But then do what with it? – Blakes Seven Aug 26 '15 at 08:00
  • I want to just **list them out**. – Abinash Kumar Aug 26 '15 at 08:07
  • Well, you stored them this way, so what do you "think" you do? Paths to objects with named keys like this need to be explicit. You cannot say `cpuMetric[50..59]`. So the only ways to do it are where you can alter the document ( aggregate and mapReduce ) and either expliciltly list each one ( aggregate ) or do it in code ( mapReduce ). But both are overkill just to list one document. Loop the index values in your client code. But you need to return the "whole" document and not just 10 time periods. Or you change the structure. – Blakes Seven Aug 26 '15 at 08:11
  • Thanks for your advice but i was referring [this document](http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb). The question is just for my test environment where i have only one server and one document, but in production i will have to parse many documents as i will be monitoring many servers. – Abinash Kumar Aug 26 '15 at 08:17
  • If you have other questions then please post them as [another question.](http://stackoverflow.com/questions/ask) – Blakes Seven Aug 26 '15 at 08:20
  • @AbinashKumar Have you thought about using a time-series database for this use case. They are optimized for time range queries and aggregations. – Sergei Rodionov Aug 26 '15 at 20:45
  • @SergeiRodionov Yeah i have looked into time-series database such as 'InfluxDB'. But due to lack of time i could not test InfluxDB much, thus i am going with MongoDB. In future if i again do a time-series project then will surely have a look into that. – Abinash Kumar Aug 27 '15 at 07:14
  • @AbinashKumar For what it's worth, modern TSDBs are able to store tuples (millisecond time + float value) in less than 15-20 bytes before compression. Is this in line with document dbs such as mongo? – Sergei Rodionov Aug 27 '15 at 16:17

1 Answers1

1

If it is not too late it is a good idea to change your data modal. With this one you can't use indexes because you used values which are basically minutes as field names. So a better approach would be to keep minutes data in an array as follow and use data values as minutes. Then you can index cpuMetric.minute field and easily sort your data.

{ "_id" : "192.168.xxx.xxx1440yyy000", 
  "time" : ISODate("2015-yy-xxT05:30:00Z"),
  "ip" : "192.168.xxx.xxx", 
  "serverId" : "abc", 
  "cpuMetric: [{minute: ISODate("2015-yy-xxT05:30:00Z")", { "usage" : 25.99, "process" : 123, "cores" : 4, "speed" : 2394, "uptime" : 45839 }}, {minute: ISODate("2015-yy-xxT05:31:00Z"), { "speed" : 2394, "uptime" : 45899, "usage" : 26.003333333333334, "process" : 121, "cores" : 4 }}, ...]

After that you can query your data and sort your data on field cpuMetric.minute.

db.pcmetrics.find(
   { cpuMetric: { $elemMatch: {minute:  {$gte: 10MinutesAgoInDateFormat} } } }
)
cubbuk
  • 7,800
  • 4
  • 35
  • 62
  • I accept your answer but could you help me with my format of data model only. I am refering to [this document](http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb) – Abinash Kumar Aug 26 '15 at 08:10
  • Well then you can sort your data according to `time` field, this will give you the last hour then you can project the last ten minutes out of it manually. This is feasible as your record would contain at most 60 elements and it won't be very expensive to project the requested 10 data out of this record. – cubbuk Aug 26 '15 at 08:15
  • Thanks for the solution. I will compare the time complexities for whether changing the schema will be better or going with the same schema. – Abinash Kumar Aug 26 '15 at 08:28
  • I think in your case it won't matter much as your documents contain very less rows just 60 in this case. In this approach you only have to check the last 2 documents to ensure that you have the last 10 minutes. Just keep in mind that usually it is a better idea not to use values of your documents as your field names as it is almost impossible to index these values. – cubbuk Aug 26 '15 at 08:30
  • Mongo store date values as 8 bytes in BSON format. check the accepted answer: http://stackoverflow.com/questions/6764821/what-is-the-best-way-to-store-dates-in-mongodb – cubbuk Aug 27 '15 at 16:19