0

I have a list of probabilities dictating whether an output is a 1 or a 0 in a numpy array. I'm trying to split these probabilities into two separate arrays based on a certainty level of 75%. If either probability is above 75% it goes into the 'certain' array, and if neither cross that threshold, it goes into the 'uncertain' array.

For some reason, when I run this code it does not correctly differentiate between the two and proceeds to add all the instances to the 'certain' array.

Code:

probs = rfc.predict_proba(X_validate)

certain = []
uncertain = []

for i in probs[0:10]:
    zero_val = i[0]
    one_val = i[1]

    if zero_val or one_val > 0.75:
        certain.append(i)
    else:
        uncertain.append(i)


print(len(certain))
print(certain)

print(len(uncertain))
print(uncertain)

Here is the output:

10
[array([0., 1.]), array([1., 0.]), array([0.95, 0.05]), array(
[0.77, 0.23]), array([0.74, 0.26]), array([0.38, 0.62]), array
([0.11, 0.89]), array([1., 0.]), array([0.94, 0.06]), array([0
.19, 0.81])]
0
[]

What is causing every instance to be added to the 'certain' array regardless? Thanks!

jblew
  • 274
  • 1
  • 8
  • 21
  • Note that the alleged question has many answers which use `in`, which do not work here because the operator is `>`, not `==`. – Florian Weimer Jul 24 '18 at 19:56

1 Answers1

3

zero_val or one_val > 0.75 is more or less equivalent to zero_val != 0 or one_val > 0.75 in this context, so zero_val is essentially treated as a boolean flag. You need to write zero_val > 0.75 or one_val > 0.75.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92