pandas pivot table is adding unwanted columns which causes my counts to duplicate

Question

I am trying to pivot on a dataframe that looks like so

country col_a col_b col_c status      group
   a       4    5     6   confirmed     z
   a       4    5     6   failed        z
   a       4    5     6   unknown       y
   a       4    5     6   confirmed     z
   b       4    5     6   failed        y
   b       4    5     6   confirmed     y
   b       4    5     6   failed        z
   b       4    5     6   confirmed     z
   b       4    5     6   confirmed     z

I am trying to pivot so that I have a total for each country, and then each group within that country is broken down. As below.

country group confirmed failed unknown
   a      NaN       2      1       1
  Nan      z        2      1       0
  NaN      y        0      0       1
   b      NaN       3      2       0
  NaN      z        2      1       0
  NaN      y        1      1       0

The issue i'm having is that whilst it will look just like this, it will then append the other cols across the top and just repeat the status as below.

                       col_a                    col_b                    col_c
country group confirmed failed unknown confirmed failed unknown confirmed failed unknown
   a      NaN       2      1       1      2      1       1         2      1       1 
  Nan      z        2      1       0      2      1       0         2      1       0 
  NaN      y        0      0       1      0      0       1         0      0       1
   b      NaN       3      2       0      3      2       0         3      2       0 
  NaN      z        2      1       0      2      1       0         2      1       0
  NaN      y        1      1       0      1      1       0         1      1       0

The code im using is -

testdf = df2.pivot_table(index=['country','group'], columns='status', aggfunc=len, fill_value=0)

and when it prints in the console, it looks fine. But as soon as I output to excel, its all broken!

Any ideas?

`df.pivot_table(index=['country','group'], columns='status',aggfunc='size',fill_value=0).reset_index()` — ansev, Apr 27 '20 at 17:16

score 0 · Answer 1 · answered Apr 27 '20 at 17:18

I am not sure if it is duplicated question, so I decided reopen, I think you want aggfunc='size'

new_df = (df.pivot_table(index=['country','group'],
                         columns='status',
                         aggfunc='size',
                         fill_value=0)
            .reset_index()
            .rename_axis(None, axis=1))
print(new_df)

  country group  confirmed  failed  unknown
0       a     y          0       0        1
1       a     z          2       1        0
2       b     y          1       1        0
3       b     z          2       1        0

piRSquared · Answer 2 · 2020-04-27T17:58:21.253

I noticed that OP is looking for what appears to be a "total" row per country. This is a strategy to get that.

from collections import defaultdict

result = defaultdict(int)

cols = ('country', 'group', 'status')

for c, g, s in zip(*map(df2.get, cols)):
    result[(c, g, s)] += 1
    result[(c, 'total', s)] += 1

pd.Series(result).rename_axis(cols[:2] + (None,)).unstack(fill_value=0).reset_index()

  country  group  confirmed  failed  unknown
0       a  total          2       1        1
1       a      y          0       0        1
2       a      z          2       1        0
3       b  total          3       2        0
4       b      y          1       1        0
5       b      z          2       1        0

Strategy 2

result = {}

for c, grp in df2.groupby('country'):
    result[(c, 'total')] = {**grp.status.value_counts()}
    for g, grp_ in grp.groupby('group'):
        result[(c, g)] = {**grp_.status.value_counts()}

idx = pd.MultiIndex.from_tuples(result.keys(), names=['country', 'group'])
pd.DataFrame.from_records([*result.values()], idx) \
  .fillna(0, downcast='infer').reset_index()

  country  group  confirmed  unknown  failed
0       a  total          2        1       1
1       a      y          0        1       0
2       a      z          2        0       1
3       b  total          3        0       2
4       b      y          1        0       1
5       b      z          2        0       1

Strategy 3

x = df2.groupby(['group', 'country', 'status']).size()
y = pd.concat({'total': x.groupby(['country', 'status']).size()}, names=['group'])
x.append(y).unstack(fill_value=0) \
 .rename_axis(None, axis=1).swaplevel(0, 1).sort_index().reset_index()

  country  group  confirmed  failed  unknown
0       a  total          1       1        1
1       a      y          0       0        1
2       a      z          2       1        0
3       b  total          2       2        0
4       b      y          1       1        0
5       b      z          2       1        0

score 0 · Answer 3 · answered Apr 27 '20 at 18:28

0

df['abc'] = df[['col_a','col_b','col_c']].sum(axis=1)
table = pd.pivot_table(df, index =['country','group'], columns='status', values='abc', fill_value = 0)

answered Apr 27 '20 at 18:28

Mo Huss

434
2
11

1

Please don't post only code as an answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually of higher quality, and are more likely to attract upvotes. – today Apr 28 '20 at 23:19
1

Please don't get me wrong: I am not saying your answer is good or bad, perfect or terrible. I am just saying that it would be better to accompany your code with some explanation so that the readers/visitors can understand what is going on and how it solves the problem. But if you think that's not needed and it's self-explanatory, then it's fine. – today Apr 29 '20 at 01:27

pandas pivot table is adding unwanted columns which causes my counts to duplicate

3 Answers3