0

I have a data set pulled from a database using pandas.io.sql.read_frame which looks like this

      Period Category    Projected       Actual     Previous
 0   2013-01 A          1214432.94   3175516.32   3001149.50  
 1   2013-01 B           624010.78    867729.20    866639.38
 2   2013-01 C          2533443.36   2314765.87   2482210.68
 3   2013-01 D          5616228.49   5672648.92   5918737.79
 4   2013-01 E           492184.31   1009281.36    990499.75
 5   2013-01 F         32824689.07  29610034.26  32248832.59
 6   2013-01 G            94192.33    152839.03    189061.80
 7   2013-01 H          1271544.89   1545591.40   1054648.58
 8   2013-01 I          8273369.88   8656894.51   8691683.73
 9   2013-01 J          8540953.73   8012622.14   8671895.07
 10  2013-01 K          8016059.13   8530401.75   9953181.37
 11  2013-01 L          1190095.56    512354.65    459954.82
 12  2013-01 M           850057.11   1077172.22   1097503.89
 13  2013-02 A          1227779.01   2850482.70   3070764.66
 14  2013-02 B           636124.55    822016.04    866802.59
 15  2013-02 C          2581194.49   2471194.78   2681301.30
 16  2013-02 D          5970719.17   5179206.09   5872806.59
 17  2013-02 E           477820.01   1199334.74   1330452.48
 18  2013-02 F         34537100.44  29082997.97  31982248.04
 19  2013-02 G            92523.45     75865.03     93782.83
 ...

If I pivot the table using D.pivot_table(rows="Category", cols="Period", aggfunc="sum") I get a multi-indexed data frame looking like this

<class 'pandas.core.frame.DataFrame'>
Index: 13 entries, A ...
Data columns (total 33 columns):
(Projected, 2013-01)    13  non-null values
(Projected, 2013-02)    13  non-null values
(Projected, 2013-03)    13  non-null values
(Projected, 2013-04)    13  non-null values
(Projected, 2013-05)    13  non-null values
(Projected, 2013-06)    13  non-null values
(Projected, 2013-07)    13  non-null values
(Projected, 2013-08)    13  non-null values
(Projected, 2013-09)    13  non-null values
(Projected, 2013-10)    13  non-null values
(Projected, 2013-11)    12  non-null values
(Actual, 2013-01)       13  non-null values
(Actual, 2013-02)       13  non-null values
(Actual, 2013-03)       13  non-null values
(Actual, 2013-04)       13  non-null values
(Actual, 2013-05)       13  non-null values
(Actual, 2013-06)       13  non-null values
(Actual, 2013-07)       13  non-null values
(Actual, 2013-08)       13  non-null values
(Actual, 2013-09)       13  non-null values
(Actual, 2013-10)       13  non-null values
(Actual, 2013-11)       12  non-null values
(Previous, 2013-01)     13  non-null values
(Previous, 2013-02)     13  non-null values
(Previous, 2013-03)     13  non-null values
(Previous, 2013-04)     13  non-null values
(Previous, 2013-05)     13  non-null values
(Previous, 2013-06)     13  non-null values
(Previous, 2013-07)     13  non-null values
(Previous, 2013-08)     13  non-null values
(Previous, 2013-09)     13  non-null values
(Previous, 2013-10)     13  non-null values
(Previous, 2013-11)     12  non-null values
dtypes: float64(33)     

But I don't want a hierarchical index on the columns. Instead, I would like to have the non-pivot columns (Projected, Actual, and Previous) as values in the form of triple (i.e. a Series) so the final table looks like this

Period 2013-01 2013-02 2013-03 ...
Group
A       Series  Series  Series ...
B       Series  Series  Series ...
C       Series  Series  Series ...
D       Series  Series  Series ...
...

Where each 'Series' is a pandas series of three numbers being the respective (Projected, Actual, and Previous) aggregate values.

I've looked at stack, unstack, various combination or rows, cols, and values arguments to pivot table and of the melt function in pandas.core.reshape, but none of them seem to do quite what I want.

sirlark
  • 2,187
  • 2
  • 18
  • 28

2 Answers2

1

I believe you're presenting us and XY problem, as resulting dataset containing Series has no practical applicability.

Maybe you're looking for a groupby object instead of pivot?

>>> df.groupby(["Category", 'Period']).get_group(('A', '2013-01'))
    Period Category   Projected      Actual   Previous
0  2013-01        A  1214432.94  3175516.32  3001149.5
>>> df.groupby(["Category", 'Period']).get_group(('A', '2013-01'))[['Projected', 'Actual', 'Previous']].sum()
Projected    1214432.94
Actual       3175516.32
Previous     3001149.50
dtype: float64
Community
  • 1
  • 1
alko
  • 46,136
  • 12
  • 94
  • 102
  • I need the data in tabular format for convenient traversal via the django template language, i.e. I want to loop over the rows. Ultimately, I want each cell in the table to have three values, comma separated. Hence wanting tuples in a pivoted format. If I use group_by, I'll essentially have to build the table in python. If I can get pandas to do this rather than in python, which is very slow, I would consider that practical applicability. What can I do here to avoid the XY problem? – sirlark Dec 29 '13 at 11:23
  • @sirlark, have a look at my answer down below. I have been playing around with this kind of solutions in a few other contexts, and it has worked well. I can never remember the syntax without out trying it a few different ways, but essentially once you have a list of series (done with a list interpolation below), you can `zip(*[list_of_series])` to create tuples. – 8one6 Dec 30 '13 at 18:11
0

I believe @alko is on the right track suggesting a groupby at the beginning followed by a sum. If your goal is to then have an iterable in each place then you can use zip to create a column of tuples. How about this:

import pandas as pd
import numpy as np
from itertools import product

np.random.seed(1)

periods = range(0,3)
categories = list('ABC')

rows = list(product(periods, categories)) * 2
n = len(rows)

df = pd.DataFrame({'Projected': np.random.randn(n), 
                   'Actual': np.random.randn(n), 
                   'Previous': np.random.randn(n)},
                  index = pd.MultiIndex.from_tuples(rows))
df.index.names = ['Period', 'Category']
summed = df.groupby(level=['Period', 'Category']).sum()
summed['tuple'] = zip(*[summed[c] for c in ['Projected', 'Actual', 'Previous']])
result = summed['tuple'].unstack('Period')

Gives

enter image description here


And just for completeness, you can go back the other way, though it's a bit of a pain:

andback = result.stack().apply(lambda t: pd.Series({'Projected': t[0],
                                              'Actual': t[1],
                                              'Previous': t[2]}))

Gives

enter image description here


And just to help someone out in the comments. Here's how I'd add subtotals and grand totals:

def add_subtotal(g):
    category = g.index.get_level_values('Category')[0]
    g.loc[(category, 'subtotal'), :] = g.sum()
    return g

with_subtotals = andback.groupby(level='Category', axis=0).transform(add_subtotal)

with_subtotals.loc[('Grand', 'Total'), :] = with_subtotals\
    .loc[with_subtotals.index.get_level_values('Period')=='subtotal', :]\
    .sum()

Which gives:

enter image description here

8one6
  • 13,078
  • 12
  • 62
  • 84
  • I've hunted around for a way to do this without breaking out into python to form the tuples, and cannot find one, so it looks like this is closest I'm going to get, and I will accept this answer. At least zip doesn't involve a python loop directly, and it is reasonably fast for my situation. – sirlark Jan 06 '14 at 17:07
  • Glad it helped. One other note: with tuples in the content of a DataFrame, you can easily turn a single column into a MultiIndex with something like: `df.index = pd.MultiIndex.from_tuples(df['tuple_column'])` – 8one6 Jan 06 '14 at 17:44
  • Do you know if it is possible to add sub-totals to each category (A, B, C)? Then have a final total at the end? – Tristan Forward Aug 21 '14 at 23:04
  • @8one6 The "andback" one. Ideally having a sub-total under each category, or if that's not possible. Adding to the Period Column a "A Total", "B Total", "C Total" and then a "Grand Total" – Tristan Forward Aug 22 '14 at 02:40
  • This should really be a separate question, but I've put at least one way of doing this above. – 8one6 Aug 22 '14 at 12:39
  • @8one6 Thanks so much! Re-asked question here: http://stackoverflow.com/questions/25449581/python-pandas-dataframe-with-sub-totals-and-grand-total – Tristan Forward Aug 22 '14 at 14:37