5

This should be quick, but none of the pivot/groupby work I'm doing is coming up with what I need.

I have a table like this:

        Letter  Period  Amount
YrMnth
2014-12      B       6       0
2014-12      C       8       1
2014-12      C       9       2
2014-12      C      10       3
2014-12      C       6       4
2014-12      C      12       5
2014-12      C       7       6
2014-12      C      11       7
2014-12      D       9       8
2014-12      D      10       9
2014-12      D       1      10
2014-12      D       8      11
2014-12      D       6      12
2014-12      D      12      13
2014-12      D       7      14
2014-12      D      11      15
2014-12      D       4      16
2014-12      D       3      17
2015-01      B       7      18
2015-01      B       8      19
2015-01      B       1      20
2015-01      B      10      21
2015-01      B      11      22
2015-01      B       6      23
2015-01      B       9      24
2015-01      B       3      25
2015-01      B       5      26
2015-01      C      10      27

I want to pivot it, so that the Index is basically YrMonth and Letter, the Period are the columns, and the Amount are the values.

I understand Pivot in general, but am getting errors when I try to do it with multiple indexes. I made the index a column, and tried this:

In [76]: df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

But I came out with this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-fc2a4c5f244d> in <module>()
----> 1 df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in pivot(self, index, columns, values)
   3761         """
   3762         from pandas.core.reshape import pivot
-> 3763         return pivot(self, index=index, columns=columns, values=values)
   3764
   3765     def stack(self, level=-1, dropna=True):

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in pivot(self, index, columns, values)
    331         indexed = Series(self[values].values,
    332                          index=MultiIndex.from_arrays([index,
--> 333                                                        self[columns]]))
    334         return indexed.unstack(columns)
    335

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    225                                        raise_cast_failure=True)
    226
--> 227                 data = SingleBlockManager(data, index, fastpath=True)
    228
    229         generic.NDFrame.__init__(self, data, fastpath=True)

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
   3734             block = make_block(block,
   3735                                placement=slice(0, len(axis)),
-> 3736                                ndim=1, fastpath=True)
   3737
   3738         self.blocks = [block]

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
   2452
   2453     return klass(values, ndim=ndim, fastpath=fastpath,
-> 2454                  placement=placement)
   2455
   2456

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
     85             raise ValueError('Wrong number of items passed %d,'
     86                              ' placement implies %d' % (
---> 87                                  len(self.values), len(self.mgr_locs)))
     88
     89     @property

ValueError: Wrong number of items passed 138, placement implies 2
user1610719
  • 1,275
  • 2
  • 18
  • 35
  • Well the index is effectively the first two columns (YrMnth, and Letter), thus there shouldn't be any duplicates if you group it that way. I just can't figure out that method – user1610719 Dec 16 '15 at 15:46

1 Answers1

4

If I understand you correctly,pivot_table might be closer to what you need:

df = df.pivot_table(index=["YrMnth", "Letter"], columns="Period", values="Amount")

Which gives you:

Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

As suggested in the comments:

 df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount")


Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

Also yields the same, if someone wants to clarify how the former will fail that would be great.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • The correct syntax for `pivot_table` should be: `df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount")` – Fabio Lamanna Dec 16 '15 at 15:57
  • @Fabio, what is the difference? – Padraic Cunningham Dec 16 '15 at 15:59
  • The correct syntax is `pandas.pivot_table()` not `df.pivot_table()`. – Fabio Lamanna Dec 16 '15 at 16:00
  • @Fabio, again, what is the difference? – Padraic Cunningham Dec 16 '15 at 16:01
  • Just the definition of the functions. `df.pivot()` and `pandas.pivot_table()` are different: [df.pivot()](http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.pivot.html) and [pd.pivot_table()](http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.pivot_table.html) – Fabio Lamanna Dec 16 '15 at 16:05
  • @Fabio, they both give identical output, what is actually different? – Padraic Cunningham Dec 16 '15 at 16:05
  • I didn't test the output, just referring to the correct syntax definitions as proposed in the documentation. Maybe [this](http://stackoverflow.com/questions/30960338/pandas-difference-between-pivot-and-pivot-table-why-is-only-pivot-table-workin) should help. – Fabio Lamanna Dec 16 '15 at 16:09
  • @Fabio, I don't see any difference at all and have seen pivot_table called as a method of the dataframe many times so not sure what the difference essentially is – Padraic Cunningham Dec 16 '15 at 16:11