Context
I'm working on a Python-3.11 project and I'm having a difficult time understanding how float type works.
More specifically, I'm working on distances between data points, and I also have thresholds for these distances.
But let's explain in order.
I have a distance called threshold
which is a numpy.float32. This distance is a distance between two arbitrary data points. I'll use this threshold
as a threshold for other distances. But before using it, I floor it to the 10th decimal number:
display(threshold)
threshold_floored = math.floor(threshold * 10000000000)/10000000000
display(threshold_floored)
>>> output:
0.16666667
0.1666666716
I now use a clustering algorithm that creates clusters based on distance and uses threshold_floored
as threshold. Points in cluster A have distance smaller than or equal to threshold_floored
to points in cluster B. If for some reason the distance between a point in cluster A and a point in cluster B is bigger than or equal to threshold_floored
, I print a sentence to notify me of this error.
Running my code I sometimes see the printed sentence, but when I check I get this:
display(threshold_floored)
display(distance_pointsAB)
>>> output:
0.1666666716
0.16666667
The distance is less than threshold_floored
(but equal to threshold)
, but then why do I get the notification?
BTW the notification code is this:
if distance_pointsAB > threshold_floored:
print("Notification")
Problem
However I noticed the following things:
distance_pointsAB_floored = math.floor(distance_pointsAB * 10000000000)/10000000000
display(threshold)
display(threshold_floored)
display(distance_pointsAB)
display(distance_pointsAB_floored)
print("{0:.60f}".format(threshold))
print("{0:.60f}".format(threshold_floored))
print("{0:.60f}".format(distance_pointsAB))
print("{0:.60f}".format(distance_pointsAB_floored))
>>> output:
0.16666667
0.1666666716
0.16666667
0.1666666716
0.166666671633720397949218750000000000000000000000000000000000 <---- threshold
0.166666671600000010355913104831415694206953048706054687500000 <---- threshold_floored
0.166666671633720397949218750000000000000000000000000000000000 <---- distance_pointsAB
0.166666671600000010355913104831415694206953048706054687500000 <---- distance_pointsAB_floored
The notification now makes sense, because extending the decimals, distance_pointsAB
is indeed bigger than threshold_floored
.
However why does math.floor
doesn't round threshold
or distance_pointsAB
to 0.166666671600000000000000000000000000000000000000000000000000
?
And also, since my clustering algorithm should separate points in cluster A and cluster B if their distance is less than my threshold, and I used threshold_floored
as criteria, why do I get that points in A and in B have distance bigger than the threshold? It seems that my clustering algorithm used threshold
instead of threshold_floored
. Am I right?
Is there a way to work properly with floats?
EDIT
I found the problem. The problem was that my threshold was a numpy.float32, and then I floored it converting it into a float. But then my clustering algorithm converted the threshold_floored again to numpy.float32, while the distance_pointsAB resulted in a float. The solution is a matter of setting properly value types.
Thank everybody for your advice!
two commentscomment may not be what you care about. Distances are best modeled, I think, as real numbers, and floating-point provides a decent approximation of real numbers, albeit with finite precision. But the finite precision is a certain number of *bits* on base 2, not digits in base 10. When you print finite-precision base-2 fractions out in decimal, they look weird. – Steve Summit Jan 17 '23 at 17:55