0

I have a dataset that looks like this:

Col1    Col2    Col3
A        100     100
A         0       0
A         0      100
B        100      0
C        100     100
C        100     100

I want to count the number of rows with 100 (or any other values greater than zero) based on A B and C

which will result to this:

    Col2_counts   Col3_counts
A       1           2
B       1           0
C       2           2

so I can calculate the total percentage of A B C in Col2 and Col3 etc.

I tried df.groupby(['Col1', 'Col 2', 'Col3']).transform ('count'), but it doesn't give me the desired result.

kiwi_kimchi
  • 345
  • 3
  • 12
  • What are your counting exactly? `count` is for counting rows, regardless of their value. It seems like (but you said nothing, we have to reverse engineer that) you are counting non-0 aka 100 values. In which case, `df.groupby('col1').sum()/100` seems to be what you are looking for. But are all the values either 0 or 100? You didn't say. – chrslg Jun 26 '23 at 03:54
  • @chrslg as shown in the desired result, the rows with 100. but let me edit it for clarification. Thanks for pointing it out :) – kiwi_kimchi Jun 26 '23 at 03:58
  • Well, it wasn't shown in the desired result. It could have been has well the rows with non-0 values. Or the rows with values over 50. All your examples shows only either 0 or 100 value, but, all those assumption match equally the example, but are not equivalent, unless you state that values can only be 0 or 100. There again, you could replay "yes, they can be only 0 or 100, as shown", but, well, firstly, specification usually don't take the form of a guessing game, and secondly, it would be quite strange: it would be a binary value; usually we use booleans for that, not 0 and 100. – chrslg Jun 26 '23 at 04:05
  • And the answer you accepted doesn't count the rows with 100, but the rows with >0 values. – chrslg Jun 26 '23 at 04:11

2 Answers2

2
df.set_index('Col1').gt(0).groupby(level=0).sum()

output:

        Col2    Col3
Col1        
A       1       2
B       1       0
C       2       2
Panda Kim
  • 6,246
  • 2
  • 12
0

The following block produces your expected output:

(
    df
    .set_index("Col1")
    .eq(100)
    .groupby("Col1")
    .sum()
    .add_suffix("_counts")
)

As chrslg pointed out, Panda Kim's answer evaluates rows > 0 instead of rows == 100. Also, you can add the "_counts" suffix at the end to get the column names from your example.

Here's a the step-by-step explanation of each operation:

  1. Set "Col1" as the DataFrame's index.

  2. Evaluate if cells are equal (eq) to 100.

          Col2    Col3
    Col1        
    A     True    True
    A     False   False
    A     False   True
    B     True    False
    C     True    True
    C     True    True
    
  3. Group by the "Col1" index and count True values. Using the sum operation works because it converts True to 1 and False to 0.

  4. Add a "_counts" suffix to all columns.

          Col2_counts   Col3_counts
    Col1        
    A         1              2
    B         1              0
    C         2              2
    
Lonbot
  • 15
  • 5