(Please note that there's a question Pandas: group by and Pivot table difference, but this question is different.)
Suppose you start with a DataFrame
df = pd.DataFrame({'a': ['x'] * 2 + ['y'] * 2, 'b': [0, 1, 0, 1], 'val': range(4)})
>>> df
Out[18]:
a b val
0 x 0 0
1 x 1 1
2 y 0 2
3 y 1 3
Now suppose you want to make the index a
, the columns b
, the values in a cell val
, and specify what to do if there are two or more values in a resulting cell:
b 0 1
a
x 0 1
y 2 3
Then you can do this either through
df.val.groupby([df.a, df.b]).sum().unstack()
or through
pd.pivot_table(df, index='a', columns='b', values='val', aggfunc='sum')
So it seems to me that there's a simple correspondence between correspondence between the two (given one, you could almost write a script to transform it into the other). I also thought of more complex cases with hierarchical indices / columns, but I still see no difference.
Is there something I've missed?
Are there operations that can be performed using one and not the other?
Are there, perhaps, operations easier to perform using one over the other?
If not, why not deprecate
pivot_tale
?groupby
seems much more general.