7

I'd like a way to summarise a database table so that rows sharing a common ID are summarised into one row of output.

My tools are SQLite and Python 2.x.

For example, given the following table of fruit prices at my local supermarkets...

+--------------------+--------------------+--------------------+
|Fruit               |Shop                |Price               |
+--------------------+--------------------+--------------------+
|Apple               |Coles               |$1.50               |
|Apple               |Woolworths          |$1.60               |
|Apple               |IGA                 |$1.70               |
|Banana              |Coles               |$0.50               |
|Banana              |Woolworths          |$0.60               |
|Banana              |IGA                 |$0.70               |
|Cherry              |Coles               |$5.00               |
|Date                |Coles               |$2.00               |
|Date                |Woolworths          |$2.10               |
|Elderberry          |IGA                 |$10.00              |
+--------------------+--------------------+--------------------+

... I want to produce a summary table showing me the price of each fruit at each supermarket. Blank spaces should be filled by NULLs.

+----------+----------+----------+----------+
|Fruit     |Coles     |Woolworths|IGA       |
+----------+----------+----------+----------+
|Apple     |$1.50     |$1.60     |$1.70     |
|Banana    |$0.50     |$0.60     |$0.70     |
|Cherry    |NULL      |$5.00     |NULL      |
|Date      |$2.00     |$2.10     |NULL      |
|Elderberry|NULL      |NULL      |$10.00    |
+----------+----------+----------+----------+

I believe the literature calls this a "pivot table" or a "pivot query", but apparently SQLite doesn't support PIVOT. (The solution in that question uses hardcoded LEFT JOINs. This doesn't really appeal to me because I don't know the "column" names in advance.)

Right now I do this by iterating through the entire table in Python and accumulating a dict of dicts, which is a bit klutzy. I am open to better solutions, either in Python or SQLite, that will give the data in tabular form.

Community
  • 1
  • 1
Li-aung Yip
  • 12,320
  • 5
  • 34
  • 49

2 Answers2

13

The pandas package can handle this very nicely.

>>> import pandas
>>> df=pandas.DataFrame(data, columns=['Fruit', 'Shop', 'Price'])
>>> df.pivot(index='Fruit', columns='Shop', values='Price')
Shop        Coles   IGA  Woolworths
Fruit                              
Apple         1.5   1.7         1.6
Banana        0.5   0.7         0.6
Cherry        5.0   NaN         NaN
Date          2.0   NaN         2.1
Elderberry    NaN  10.0         NaN

The documentation: http://pandas.pydata.org/pandas-docs/stable/reshaping.html

Some IPython Notebooks to learn pandas: https://bitbucket.org/hrojas/learn-pandas

Hope that will help.
Regards
Patrick Brockmann

PBrockmann
  • 303
  • 5
  • 16
8

On python side, you could use some itertools magic for rearranging your data:

data = [('Apple',      'Coles',      1.50),
        ('Apple',      'Woolworths', 1.60),
        ('Apple',      'IGA',        1.70),
        ('Banana',     'Coles',      0.50),
        ('Banana',     'Woolworths', 0.60),
        ('Banana',     'IGA',        0.70),
        ('Cherry',     'Coles',      5.00),
        ('Date',       'Coles',      2.00),
        ('Date',       'Woolworths', 2.10),
        ('Elderberry', 'IGA',        10.00)]

from itertools import groupby, islice
from operator import itemgetter
from collections import defaultdict

stores = sorted(set(row[1] for row in data))
# probably splitting this up in multiple lines would be more readable
pivot = ((fruit, defaultdict(lambda: None, (islice(d, 1, None) for d in data))) for fruit, data in groupby(sorted(data), itemgetter(0)))

print 'Fruit'.ljust(12), '\t'.join(stores)
for fruit, prices in pivot:
    print fruit.ljust(12), '\t'.join(str(prices[s]) for s in stores)

Output:

Fruit        Coles      IGA     Woolw
Apple        1.5        1.7     1.6
Banana       0.5        0.7     0.6
Cherry       5.0        None    None
Date         2.0        None    2.1
Elderberry   None       10.0    None
sloth
  • 99,095
  • 21
  • 171
  • 219
  • `itertools` magic is my favourite kind of magic. Hoping someone else will post an SQLite solution, but this is already better than what I was doing. – Li-aung Yip Jul 11 '12 at 09:07
  • @Li-aungYip There's [no support](http://www.sqlite.org/cvstrac/tktview?tn=1424) for pivot tables in SQLite, hence the ~1000 questions here on stackoverflow asking how to do that. But there's a [perl module](https://github.com/bduggan/SQLite-VirtualTable-Pivot) that does that, with an example [here](http://search.cpan.org/~bduggan/SQLite-VirtualTable-Pivot-0.02/lib/SQLite/VirtualTable/Pivot.pm). – sloth Jul 11 '12 at 09:45
  • 1
    I ended up using this to write a function that creates the pivot table as above) and then writes it back to the database as a temporary table. This lets you do further queries (`JOIN`s, etc) on the pivoted data. Temporary tables are ugly as all get out, but "if it's stupid and it works, it ain't stupid." – Li-aung Yip Jul 13 '12 at 06:55
  • @Li-aungYip This is what `SQLite::VirtualTable::Pivot` does, too. But I'm not a big perl fan, so I hope you share your code with the world :-) – sloth Jul 13 '12 at 07:01
  • It's... unbelievably ugly. I'll post it on the Code Review stack exchange later (if I remember.) – Li-aung Yip Jul 13 '12 at 08:28