Python / Django : emulating a multidimensional layer on a MySQL database

Question

I'm working on a Django project where I need to provide a lot of different visualizations on the same data (for example average of a value for each month, for each year / for a location, etc...).

I have been using an OLAP database once in college, and I thought that it would fit my needs, but it appears that it is much too heavy for what I need. Actually the volume of data is not very big, so I don't need any optimization, just a way to present different visualizations of the same data without having to write 1000 times the same code.

So, to recap, I need a python library:

to emulate a multidimensional database (OLAP style would be nice because I think it is quite convenient : star structure, and everything)
non-intrusive, because I can't modify anything on the existing MySQL database
easy-to-use, because otherwise there's no point in replacing some overhead by another.

"much too heavy for what I need" doesn't make much sense. You're going to have to provide details on why a SQL database and a Star Schema are "too heavy". — S.Lott, Jun 01 '10 at 17:15
maybe I am wrong, but OLAP approach is aimed at designing data warehouses, i.e. big amount of data, where changing the dimension on which you analyse the data would take a huge amount of time. So in this case there is some kind of caching mechanism. I meant that I just need the convenience of star (or flake) structure schema, without the overhead of having to install a completely new system... Is it clearer ? — sebpiq, Jun 01 '10 at 20:15

sebpiq · Accepted Answer · 2010-06-28T11:46:05.287

Ok ... I finally came up with my own solution ( https://code.google.com/p/django-cube/ ), because I couldn't find what I wanted.

With a model like this:

class Instrument(models.Model):
    name = models.CharField(max_length=100)

class Musician(models.Model):
    firstname = models.CharField(max_length=100)
    instrument = models.ForeignKey(Instrument)

Create a cube:

>>> c = Cube(['instrument__name', 'firstname'], Musician.objects.all(), len)
... #Cube(dimensions, queryset, aggregation_function)
... #You can use the Django field-lookup syntax for dates and foreign keys !!!

Query the cube along one (or several) dimension(s):

>>> c.measure_dict('firstname', 'instrument__name', full=False) == {
...     'Miles': {
...         'trumpet': {'measure': 1},
...         'sax': {'measure': 0},
...         'piano': {'measure': 0},
...     },
...     'John': {
...         'trumpet': {'measure': 0},
...         'sax': {'measure': 1},
...         'piano': {'measure': 4},
...     },
... }

Use the custom template tags, etc ...

HI Sebpiq, Could you tool be used to handle this scenario. I've tried working with Django-Cube earlier but not in this context: http://stackoverflow.com/questions/9300922/calculating-and-storing-average-data-on-a-daily-weekly-and-monthly-and-yearly-b — Mridang Agarwalla, Feb 16 '12 at 07:17

score 1 · Answer 2 · answered Jun 01 '10 at 14:18

1

Why not just use the standard ORM aggregation functions: http://docs.djangoproject.com/en/dev/topics/db/aggregation/

Wherever you think performance is going to be a hit, you can denormalize that field.

answered Jun 01 '10 at 14:18

lprsd

84,407
47
135
168

No ... Writing all these statistics calculations with Django Orm is extremely painful. As I said, I don't care about performance. I just need a convenient, powerful and flexible way to calculate aggregations on my data. I have been trying quite a lot, and writing quite a lot of code, and django ORM is clearly not suited: That is what multidimensional databases are actually made for ! – sebpiq Jun 01 '10 at 14:29

score 0 · Answer 3 · answered Oct 30 '13 at 13:33

0

There is also http://cubes.databrewery.org/ . Lightweight OLAP engine in python.

It can work on top of existing database schemas, provides OLAP operations, and is easy to use.

answered Oct 30 '13 at 13:33

jjmontes

24,679
4
39
51

score 0 · Answer 4 · answered Jun 01 '10 at 18:14

You have Python defaultdict dictionaries.

If your data is small, simply query all of it and load up a bunch of dictionaries with counts and sums and what-not.

A "star schema" is just a poor-person's inverted database where dimensions (i.e., dictionaries) reference lists of rows. You can summarize those lists of rows to create summary dictionaries.

d1_sum= defaultdict( int )
d1_count= defaultdict( count )
d2_sum = defaultdict( int )
for row in MyFactTable.objects.all():
    d1_sum[row.attr1] += row.fact1
    d1_count[row.attr1] += 1
    d2_sum[row.attr2] += some_function( row.fact2 )
    etc.

Hi, thanks for your answer, you were right, but I decided to develop my own stuff because I still ended up writing a lot of code. You might want to check it out : https://code.google.com/p/django-cube/ — sebpiq, Jun 07 '10 at 12:51

Python / Django : emulating a multidimensional layer on a MySQL database

4 Answers4