0

I'm trying to do a seemingly very simple task. Given a dataframe:

data = {
    'lifestage': ['a', 'b', 'c', 'a', 'a', 'b'],
    'CC': [1, 1, 0, 1, 0, 0],
    'DC': [1, 0, 1, 0, 1, 0],
    'AC': [1, 1, 0, 1, 1, 1],
    'CASA': [1, 0, 0, 0, 1, 0],
    'Stage_1': [1, 0, 1, 0, 1, 0],
    'Stage_2': [0, 1, 0, 1, 0, 0],
    'Stage_3': [0, 0, 0, 1, 0, 1]
}

df1 = pd.DataFrame(data)

Where the orginal table looks like this:

lifestage CC DC AC CASA Stage_1 Stage_2 Stage_3
a 1 1 1 1 1 0 0
b 1 0 1 0 0 1 0
c 0 1 0 0 1 0 0
a 1 0 1 0 0 0 1
a 0 1 1 1 1 0 0
b 0 0 1 0 0 0 1

So that the output will look like this

Lifestage Product Stage_1 Stage_2 Stage_3
a CC 1 0 1
a DC 2 0 0
a AC 2 0 1
a CASA 2 0 0
b CC 0 1 0
b DC 0 0 0
b AC 0 1 1
b CASA 0 0 0
c CC 0 0 0
c DC 1 0 0
c AC 0 0 0
c CASA 0 0 0
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Sudowoodo9
  • 15
  • 4
  • 1
    What is the logic? We shouldn't have to guess from your output, please be explicit – mozway Feb 21 '23 at 15:57
  • What have you tried? What research have you done? I'm not sure how to do this off the top of my head, but I know some resources that should help: docs: [Reshaping and pivot tables](https://pandas.pydata.org/docs/user_guide/reshaping.html), SO question: [How can I pivot a dataframe?](https://stackoverflow.com/q/47152691/4518341) If you're stuck on a particular aspect of the problem, please [edit] to clarify, including the title; see [ask]. – wjandrea Feb 21 '23 at 16:04
  • @mozway FWIW, it *looks like* it's treating CC-CASA columns as boolean (or maybe multipliers) and summing the Stage columns. – wjandrea Feb 21 '23 at 16:06
  • @wjandrea yes, that's what I imagined (see my answer), but this doesn't yield exactly the provided output (another reason why the **logic** should always be clearly provided) – mozway Feb 21 '23 at 16:07
  • @mozway It looks like it's not exact because there's a typo in the Stage_2 column: in the code at index 3 it's 1 but in the table it's 0. Also you'd need to pass `sort=False` to `groupby` to get the same ordering, but that's minor. Anyway, this is not to take away from your point; a description would be better. – wjandrea Feb 21 '23 at 21:41

1 Answers1

0

As no logic was provided I can only guess.

I assume you want to count the 1s in Stages, for Lifestages and Products which also have a 1:

(df1.melt(['lifestage', 'Stage_1', 'Stage_2', 'Stage_3'], var_name='Product')
    .loc[lambda d: d.pop('value').eq(1)]
    .groupby(['lifestage', 'Product']).sum()
    .unstack('Product', fill_value=0)
    .stack().reset_index()
)

Output:

   lifestage Product  Stage_1  Stage_2  Stage_3
0          a      AC        2        1        1
1          a    CASA        2        0        0
2          a      CC        1        1        1
3          a      DC        2        0        0
4          b      AC        0        1        1
5          b    CASA        0        0        0
6          b      CC        0        1        0
7          b      DC        0        0        0
8          c      AC        0        0        0
9          c    CASA        0        0        0
10         c      CC        0        0        0
11         c      DC        1        0        0
mozway
  • 194,879
  • 13
  • 39
  • 75