1

How is the best way to transform the following dataframe also adding the sum of the 'status'?

Before:

plan type  hour status total
A    cont   0    ok      10
A    cont   0    notok    3
A    cont   0    other    1
A    vend   1    ok       7
A    vend   1    notok    2
A    vend   1    other    0
B    test   5    ok      20
B    test   5    notok    6
B    test   5    other   13

After:

plan type  hour  ok   notok other sum
A    cont   0    10   3      1    14
A    vend   1     7   2      0     9
B    test   5    20   6     13    39 

Thanks in advance!

Thabra
  • 337
  • 2
  • 9

2 Answers2

0

You could

In [9]: dff = df.pivot_table(index=['plan', 'type', 'hour'], columns='status', 
                             values='total')

In [10]: dff['sum'] = dff.sum(axis=1)

In [11]: dff.reset_index()
Out[11]:
status plan  type  hour  notok  ok  other  sum
0         A  cont     0      3  10      1   14
1         A  vend     1      2   7      0    9
2         B  test     5      6  20     13   39
Zero
  • 74,117
  • 18
  • 147
  • 154
0

Use set_index + unstack for reshape, add new column by assign and last reset_index with rename_axis:

df = (df.set_index(['plan', 'type', 'hour', 'status'])['total']
        .unstack()
        .assign(sum=lambda x: x.sum(1))
        .reset_index()
        .rename_axis(None, 1))
print (df)
  plan  type  hour  notok  ok  other  sum
0    A  cont     0      3  10      1   14
1    A  vend     1      2   7      0    9
2    B  test     5      6  20     13   39

If not unique triples defined by plan, type, hour then use groupby with aggregate function like mean or another answer:

print (df)
  plan  type  hour status  total
0    A  cont     0     ok     10 <- duplicate 10 for plan, type, hour
1    A  cont     0     ok    100 <- duplicate 100 for plan, type, hour
2    A  cont     0  notok      3
3    A  cont     0  other      1
4    A  vend     1     ok      7
5    A  vend     1  notok      2
6    A  vend     1  other      0
7    B  test     5     ok     20
8    B  test     5  notok      6
9    B  test     5  other     13

df = (df.groupby(['plan', 'type', 'hour', 'status'])['total'].mean()
        .unstack()
        .assign(sum=lambda x: x.sum(1))
        .reset_index()
        .rename_axis(None, 1))
print (df)
  plan  type  hour  notok  ok  other  sum
0    A  cont     0      3  55      1   59 <- 55 = (100 + 10) / 2
1    A  vend     1      2   7      0    9
2    B  test     5      6  20     13   39
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252