Sum of Boolean List in Python not functioning as expected

Question

I understand that python can treat 'True' as '1' (as do many coding languages) and as such taking the sum() of a list should return the number of trues in the list. (as demonstrated in Counting the number of True Booleans in a Python List)

I'm new to Python and have been going through some of the ISLR application exercises in Python (http://www.springer.com/us/book/9781461471370).

Chapter 2, problem 10 (h) has a pretty simple question asking for the number of observations of a variable ('rm') that are greater than 7. I would expect the following code to work:

test = [Boston['rm'] > 7]
sum(test)

However this returns the entire list "test" with 0's and 1's, not its sum. Can anyone explain why? (note Boston is from the Boston data set from the MASS package in R)

If I use a tuple or numpy array instead of a list it works just fine; for example:

test2 = (Boston['rm'] > 7)
sum(test2)

test3 = np.array(Boston['rm'] > 7)
sum(test3)

Also "test" seems to be a proper Boolean list because the following code using it to subset "Boston" also works fine:

test4 = Boston[Boston['rm'] > 7]
len(test4)

While I have clearly found several methods that work, I'm confused why the first did not. Thanks in advance.

Because it calculates `0` + first row + second row + .... If you want the sum, use `(Boston['rm'] > 7).sum() — Willem Van Onsem, Dec 28 '17 at 22:38

score 6 · Accepted Answer · answered Dec 28 '17 at 22:41

6

If I use a tuple or numpy array instead of a list it works just fine; for example:
test2 = (Boston['rm'] > 7)
sum(test2)

test3 = np.array(Boston['rm'] > 7)
sum(test3)

(Boston['rm'] > 7) uses parentheses for grouping; it isn’t a tuple. The tuple equivalent would be (Boston['rm'] > 7,) (note the comma), and it breaks in the same way as the list does. Using np.array on an array doesn’t change it – it’s like the difference between list(5) and [5].

As for why it doesn’t work: Boston['rm'] > 7 is an array, so you want to get its sum directly. Wrapping it in another list means you’re taking the sum of a list of arrays and not a list of booleans.

answered Dec 28 '17 at 22:41

Ry-

218,210
55
464
476

I see, so that's why the sum returned a list of 0's and 1's; each each value (true or false) was treated (and summed) as separate items (to 0 or 1 respectively)? – K Morgan Dec 29 '17 at 01:48
@KMorgan: Yes, `sum(l)` evaluates `0 + l[0] (+ l[1] + …)`, and the numpy magic makes `0 + l[0]` equivalent to adding `0` to every item of the array. – Ry- Dec 29 '17 at 02:23

Sum of Boolean List in Python not functioning as expected

1 Answers1