Counting the number of rows based on the value of N number of columns

Question

I have a dataset that looks like this:

Col1    Col2    Col3
A        100     100
A         0       0
A         0      100
B        100      0
C        100     100
C        100     100

I want to count the number of rows with 100 (or any other values greater than zero) based on A B and C

which will result to this:

    Col2_counts   Col3_counts
A       1           2
B       1           0
C       2           2

so I can calculate the total percentage of A B C in Col2 and Col3 etc.

I tried df.groupby(['Col1', 'Col 2', 'Col3']).transform ('count'), but it doesn't give me the desired result.

What are your counting exactly? `count` is for counting rows, regardless of their value. It seems like (but you said nothing, we have to reverse engineer that) you are counting non-0 aka 100 values. In which case, `df.groupby('col1').sum()/100` seems to be what you are looking for. But are all the values either 0 or 100? You didn't say. — chrslg, Jun 26 '23 at 03:54
@chrslg as shown in the desired result, the rows with 100. but let me edit it for clarification. Thanks for pointing it out :) — kiwi_kimchi, Jun 26 '23 at 03:58
Well, it wasn't shown in the desired result. It could have been has well the rows with non-0 values. Or the rows with values over 50. All your examples shows only either 0 or 100 value, but, all those assumption match equally the example, but are not equivalent, unless you state that values can only be 0 or 100. There again, you could replay "yes, they can be only 0 or 100, as shown", but, well, firstly, specification usually don't take the form of a guessing game, and secondly, it would be quite strange: it would be a binary value; usually we use booleans for that, not 0 and 100. — chrslg, Jun 26 '23 at 04:05
And the answer you accepted doesn't count the rows with 100, but the rows with >0 values. — chrslg, Jun 26 '23 at 04:11

score 2 · Accepted Answer · answered Jun 26 '23 at 03:58

2

df.set_index('Col1').gt(0).groupby(level=0).sum()

output:

        Col2    Col3
Col1        
A       1       2
B       1       0
C       2       2

answered Jun 26 '23 at 03:58

Panda Kim

6,246
2
12

Although this answer was accepted, it could probably benefit from some explanation. – jared Jun 26 '23 at 04:37

score 0 · Answer 2 · answered Jun 26 '23 at 04:43

The following block produces your expected output:

(
    df
    .set_index("Col1")
    .eq(100)
    .groupby("Col1")
    .sum()
    .add_suffix("_counts")
)

As chrslg pointed out, Panda Kim's answer evaluates rows > 0 instead of rows == 100. Also, you can add the "_counts" suffix at the end to get the column names from your example.

Here's a the step-by-step explanation of each operation:

Set "Col1" as the DataFrame's index.

Evaluate if cells are equal (eq) to 100.

      Col2    Col3
Col1        
A     True    True
A     False   False
A     False   True
B     True    False
C     True    True
C     True    True

Group by the "Col1" index and count True values. Using the sum operation works because it converts True to 1 and False to 0.

Add a "_counts" suffix to all columns.

      Col2_counts   Col3_counts
Col1        
A         1              2
B         1              0
C         2              2

Counting the number of rows based on the value of N number of columns

2 Answers2