0

I'm trying to define a time-series model in django using mongodb as the backend. I read about some best practices for timeseries data at the MongoDB Blog, and I think I understand it well enough. But now, my problem/question is: how do I define such a model using django's model syntax? I'm not sure if these would be embedded documents or simply storing arrays or dicts in the model field. Here is the suggested mongo format:

ideal mongo document format:

{
  timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),
  type: “memory_used”,
  values: {
    0: { 0: 999999, 1: 999999, …, 59: 1000000 },
    1: { 0: 2000000, 1: 2000000, …, 59: 1000000 },
    …,
    58: { 0: 1600000, 1: 1200000, …, 59: 1100000 },
    59: { 0: 1300000, 1: 1400000, …, 59: 1500000 }
  }
}

One solution is to do something like this, a document holds a day's worth of data:

# models.py
class timeseries(models.Model):
    date            = models.DateField(null=True)
    value_hour_0    = models.CharField(max_length='1024', blank=True)
    value_hour_1    = models.CharField(max_length='1024', blank=True)
    value_hour_...
    value_hour_23   = models.CharField(max_length='1024', blank=True)

Even if I store arrays or dicts in the value_hour_n field, it doesn't quite offer the advantages of querying the document as mentioned in the article, for example as timeseries.HR.MIN. Any suggestions?

James
  • 2,488
  • 2
  • 28
  • 45

2 Answers2

2

I couldn't disagree more on the structure being an ideal format, and I always seem to see this kind of notation used as a 'PHP understanding' of how to model an array, but this does not suit a Mongo interpretation.

For reasons in which I went into much more detail here, I generally find the the following structure is more flexible for query purposes:

{
  timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),
  type: “memory_used”,
  values: [
    [ 999999, 999999, …, 1000000 ],
    [ 2000000, 2000000, …, 1000000 ],
    …,
    [ 1600000, 1200000, …, 1100000 ],
    [ 1300000, 1400000, …, 1500000 ]
  ]
}

That way (as explained in the other answer) you are not tied to specifically addressing any part of the path to get to any element. The sub-document notation is one-way and you have to fully specify each one, cannot do ranges of things or find values at different positions.

Using arrays you are going to get the positional notation for free anyway, so you can values.59 or even values.20.15 if you want to, or otherwise match on keys within documents, in the array.

For your solution, you would need to play around with that more, but this and the other reading gives the general gist.

Community
  • 1
  • 1
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
0

you can do what you wrote,but what if you wanted to store values every 2 hours or every 30min ? so it's not a good practice

what about this :

class MyModelStat(models.Model):
    #other fields like : nbr_views, nbr_clicks, rates ...
    my_model = models.ForeignKey(MyModel, related_name="stats")
    created_on = models.DateTimeField(auto_now_add=True)
    previous = models.ForeignKey('self', blank=True, null=True, editable=False)

    def save(self, **kwargs):
    current_stats = self.my_model.current_stats
    if current_stats is not None and self.id is None:
        #iterate over the fields, and do your stuff
        self.rates = current_stats.rates + 1 
        self.nbr_views = current_stats.nbr_views
        #set the current stat as the previous for the new stat
        self.previous = self.deal.current_stats
    super(MyModelStat, self).save(**kwargs)



@receiver(post_save, sender=MyModelStat)
def set_mymodel_stats(sender, *args, **kwargs):
"""
Signal handler to ensure that a new stats is always chosen as the current stats - automatically. It simplifies stuff
greatly. Also stores previous revision for diff-purposes
"""
instance = kwargs['instance']
created = kwargs['created']
if created and instance.my_model:
    instance.my_model.current_stats = instance
    instance.my_model.save()
elmkarami
  • 133
  • 1
  • 8