How to use pivot table / Groupby on dataframe

Question

I'm trying to do a seemingly very simple task. Given a dataframe:

data = {
    'lifestage': ['a', 'b', 'c', 'a', 'a', 'b'],
    'CC': [1, 1, 0, 1, 0, 0],
    'DC': [1, 0, 1, 0, 1, 0],
    'AC': [1, 1, 0, 1, 1, 1],
    'CASA': [1, 0, 0, 0, 1, 0],
    'Stage_1': [1, 0, 1, 0, 1, 0],
    'Stage_2': [0, 1, 0, 1, 0, 0],
    'Stage_3': [0, 0, 0, 1, 0, 1]
}

df1 = pd.DataFrame(data)

Where the orginal table looks like this:

lifestage	CC	DC	AC	CASA	Stage_1	Stage_2	Stage_3
a	1	1	1	1	1	0	0
b	1	0	1	0	0	1	0
c	0	1	0	0	1	0	0
a	1	0	1	0	0	0	1
a	0	1	1	1	1	0	0
b	0	0	1	0	0	0	1

So that the output will look like this

Lifestage	Product	Stage_1	Stage_2	Stage_3
a	CC	1	0	1
a	DC	2	0	0
a	AC	2	0	1
a	CASA	2	0	0
b	CC	0	1	0
b	DC	0	0	0
b	AC	0	1	1
b	CASA	0	0	0
c	CC	0	0	0
c	DC	1	0	0
c	AC	0	0	0
c	CASA	0	0	0

What is the logic? We shouldn't have to guess from your output, please be explicit — mozway, Feb 21 '23 at 15:57
What have you tried? What research have you done? I'm not sure how to do this off the top of my head, but I know some resources that should help: docs: [Reshaping and pivot tables](https://pandas.pydata.org/docs/user_guide/reshaping.html), SO question: [How can I pivot a dataframe?](https://stackoverflow.com/q/47152691/4518341) If you're stuck on a particular aspect of the problem, please [edit] to clarify, including the title; see [ask]. — wjandrea, Feb 21 '23 at 16:04
@mozway FWIW, it *looks like* it's treating CC-CASA columns as boolean (or maybe multipliers) and summing the Stage columns. — wjandrea, Feb 21 '23 at 16:06
@wjandrea yes, that's what I imagined (see my answer), but this doesn't yield exactly the provided output (another reason why the **logic** should always be clearly provided) — mozway, Feb 21 '23 at 16:07
@mozway It looks like it's not exact because there's a typo in the Stage_2 column: in the code at index 3 it's 1 but in the table it's 0. Also you'd need to pass `sort=False` to `groupby` to get the same ordering, but that's minor. Anyway, this is not to take away from your point; a description would be better. — wjandrea, Feb 21 '23 at 21:41

score 0 · Accepted Answer · answered Feb 21 '23 at 16:05

As no logic was provided I can only guess.

I assume you want to count the 1s in Stages, for Lifestages and Products which also have a 1:

(df1.melt(['lifestage', 'Stage_1', 'Stage_2', 'Stage_3'], var_name='Product')
    .loc[lambda d: d.pop('value').eq(1)]
    .groupby(['lifestage', 'Product']).sum()
    .unstack('Product', fill_value=0)
    .stack().reset_index()
)

Output:

   lifestage Product  Stage_1  Stage_2  Stage_3
0          a      AC        2        1        1
1          a    CASA        2        0        0
2          a      CC        1        1        1
3          a      DC        2        0        0
4          b      AC        0        1        1
5          b    CASA        0        0        0
6          b      CC        0        1        0
7          b      DC        0        0        0
8          c      AC        0        0        0
9          c    CASA        0        0        0
10         c      CC        0        0        0
11         c      DC        1        0        0

How to use pivot table / Groupby on dataframe

1 Answers1