-2

I have a data frame set up as shown below (this is an example, the original is much larger):

enter image description here

I am trying to answer the question, "How many rows have three true values? Two true values? One true value?"

I would prefer to do it in SQL but I am struggling to find a solution. I am also struggling to find a solution in Pandas using Python. I would be open to either solution at this point.

If anyone has any pointers, I will be greatly appreciative. I'm sure it's a simple solution, I just haven't learned it yet and I can't find the correct way to do it.

forpas
  • 160,666
  • 10
  • 38
  • 76
mam9292
  • 33
  • 6
  • 1
    Please post a minimal reproducible example, and always post the actual data rather than an image of the data. You will receive higher quality answers if you don't ask folks to manually enter your test dataset. – Michael Ruth Mar 14 '21 at 04:16

3 Answers3

2

You can try:

(df.isin(["TRUE"])).sum(axis=1).value_counts()

The part df.isin(["TRUE"]) rather than df == "TRUE" is used to get around the issue described here.

Example Sample dataframe:

import numpy as np
a = np.random.choice(["TRUE", "FALSE"], (5, 5))
df = pd.DataFrame(a)
df = df.reset_index()
print(df)

This gives:

   index      0      1      2      3      4
0      0   TRUE   TRUE   TRUE  FALSE   TRUE
1      1  FALSE   TRUE  FALSE   TRUE   TRUE
2      2   TRUE   TRUE   TRUE  FALSE  FALSE
3      3  FALSE  FALSE   TRUE  FALSE   TRUE
4      4   TRUE   TRUE   TRUE   TRUE   TRUE

Then

(df.isin(["TRUE"])).sum(axis=1).value_counts()

gives:

3    2
5    1
4    1
2    1

i.e. "TRUE" appears 3 times in 2 rows, 5 times in one row, 4 times in one row, and 2 times in one row.

bb1
  • 7,174
  • 2
  • 8
  • 23
0

You can do it with a LEFT join of a CTE that returns all the possible outcomes (0, 1, 2, 3) to the table and aggregation.

If the data type of the columns ValueX is TEXT:

WITH cte(total) AS (VALUES (0), (1), (2), (3))
SELECT c.total, COUNT(t.number) counter
FROM cte c LEFT JOIN tablename t
ON c.total = (t.ValueA = 'TRUE') + (t.ValueB = 'TRUE') + (t.ValueC = 'TRUE')
GROUP BY total

If you want the results in 1 row:

SELECT SUM(total = 0) total_0,
       SUM(total = 1) total_1,
       SUM(total = 2) total_2,
       SUM(total = 3) total_3
FROM (       
  SELECT (ValueA = 'TRUE') + (ValueB = 'TRUE') + (ValueC = 'TRUE') total
  FROM tablename
)

See the demo.

If the data type of the columns ValueX is BOOLEAN (or INTEGER):

WITH cte(total) AS (VALUES (0), (1), (2), (3))
SELECT c.total, COUNT(t.number) counter
FROM cte c LEFT JOIN tablename t
ON c.total = t.ValueA + t.ValueB + t.ValueC
GROUP BY total

If you want the results in 1 row:

SELECT SUM(total = 0) total_0,
       SUM(total = 1) total_1,
       SUM(total = 2) total_2,
       SUM(total = 3) total_3
FROM (       
  SELECT ValueA + ValueB + ValueC total
  FROM tablename
)

See the demo.

forpas
  • 160,666
  • 10
  • 38
  • 76
0

Actually, if the values are strings, there is a hacky way of using string functions for this:

select 15 - length(A || B || C) as num_true, count(*)
from t
group by num_true;

This uses the fact that TRUE has 4 characters and FALSE has five, so this won't work in all languages.

You can incorporate this logic so the values are in separate columns:

select sum(15 = length(A || B || C) as true_0,
       sum(14 = length(A || B || C) as true_1,
       sum(13 = length(A || B || C) as true_2,
       sum(12 = length(A || B || C) as true_3
from t;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786