1

I'm working on a Django project where I need to provide a lot of different visualizations on the same data (for example average of a value for each month, for each year / for a location, etc...).

I have been using an OLAP database once in college, and I thought that it would fit my needs, but it appears that it is much too heavy for what I need. Actually the volume of data is not very big, so I don't need any optimization, just a way to present different visualizations of the same data without having to write 1000 times the same code.

So, to recap, I need a python library:

  • to emulate a multidimensional database (OLAP style would be nice because I think it is quite convenient : star structure, and everything)
  • non-intrusive, because I can't modify anything on the existing MySQL database
  • easy-to-use, because otherwise there's no point in replacing some overhead by another.
sebpiq
  • 7,540
  • 9
  • 52
  • 69
  • "much too heavy for what I need" doesn't make much sense. You're going to have to provide details on why a SQL database and a Star Schema are "too heavy". – S.Lott Jun 01 '10 at 17:15
  • maybe I am wrong, but OLAP approach is aimed at designing data warehouses, i.e. big amount of data, where changing the dimension on which you analyse the data would take a huge amount of time. So in this case there is some kind of caching mechanism. I meant that I just need the convenience of star (or flake) structure schema, without the overhead of having to install a completely new system... Is it clearer ? – sebpiq Jun 01 '10 at 20:15

4 Answers4

2

Ok ... I finally came up with my own solution ( https://code.google.com/p/django-cube/ ), because I couldn't find what I wanted.

With a model like this:

class Instrument(models.Model):
    name = models.CharField(max_length=100)

class Musician(models.Model):
    firstname = models.CharField(max_length=100)
    instrument = models.ForeignKey(Instrument)

Create a cube:

>>> c = Cube(['instrument__name', 'firstname'], Musician.objects.all(), len)
... #Cube(dimensions, queryset, aggregation_function)
... #You can use the Django field-lookup syntax for dates and foreign keys !!!

Query the cube along one (or several) dimension(s):

>>> c.measure_dict('firstname', 'instrument__name', full=False) == {
...     'Miles': {
...         'trumpet': {'measure': 1},
...         'sax': {'measure': 0},
...         'piano': {'measure': 0},
...     },
...     'John': {
...         'trumpet': {'measure': 0},
...         'sax': {'measure': 1},
...         'piano': {'measure': 4},
...     },
... }

Use the custom template tags, etc ...

sebpiq
  • 7,540
  • 9
  • 52
  • 69
  • HI Sebpiq, Could you tool be used to handle this scenario. I've tried working with Django-Cube earlier but not in this context: http://stackoverflow.com/questions/9300922/calculating-and-storing-average-data-on-a-daily-weekly-and-monthly-and-yearly-b – Mridang Agarwalla Feb 16 '12 at 07:17
1

Why not just use the standard ORM aggregation functions: http://docs.djangoproject.com/en/dev/topics/db/aggregation/

Wherever you think performance is going to be a hit, you can denormalize that field.

lprsd
  • 84,407
  • 47
  • 135
  • 168
  • No ... Writing all these statistics calculations with Django Orm is extremely painful. As I said, I don't care about performance. I just need a convenient, powerful and flexible way to calculate aggregations on my data. I have been trying quite a lot, and writing quite a lot of code, and django ORM is clearly not suited: That is what multidimensional databases are actually made for ! – sebpiq Jun 01 '10 at 14:29
0

There is also http://cubes.databrewery.org/ . Lightweight OLAP engine in python.

It can work on top of existing database schemas, provides OLAP operations, and is easy to use.

jjmontes
  • 24,679
  • 4
  • 39
  • 51
0

You have Python defaultdict dictionaries.

If your data is small, simply query all of it and load up a bunch of dictionaries with counts and sums and what-not.

A "star schema" is just a poor-person's inverted database where dimensions (i.e., dictionaries) reference lists of rows. You can summarize those lists of rows to create summary dictionaries.

d1_sum= defaultdict( int )
d1_count= defaultdict( count )
d2_sum = defaultdict( int )
for row in MyFactTable.objects.all():
    d1_sum[row.attr1] += row.fact1
    d1_count[row.attr1] += 1
    d2_sum[row.attr2] += some_function( row.fact2 )
    etc.
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Hi, thanks for your answer, you were right, but I decided to develop my own stuff because I still ended up writing a lot of code. You might want to check it out : https://code.google.com/p/django-cube/ – sebpiq Jun 07 '10 at 12:51