Why does Python round this sum?

Question

I'm working with 64 bits floating point arithmetic as defined by IEEE 754. The smallest subnormal number is:
2^-1074 = 5e-324 = 5 * 10^-16 * 10^-308

Adding the latter to realmin results in:
2^-1022 + 2^-1074 = 2.2250738585072014 * 10^-308 + 5 * 10^-16 * 10^-308 = (2.2250738585072014 + 0.0000000000000005) * 10^-308 = 2.2250738585072019 * 10^-308

When performing the addition in Python the result is slightly different. Here's the simple script:

import numpy as np

realmin = np.power(2.0, -1022)
print( "realmin\t\t" + str(realmin) )

smallestSub = np.power(2.0, -1074)
print( "smallest sub\t" + str(smallestSub) )

realminSucc = realmin + smallestSub
print( "sum\t\t" + str(realminSucc) )

The output is:

realmin         2.2250738585072014e-308
smallest sub    5e-324
sum             2.225073858507202e-308

Why does it rounds the sum? There's space for one extra digit as shown by realmin output.

All terms are rounded. All operations with floating point numbers result in rounding, since floating point can only hold up to around 15 decimal digits of accuracy — Ṃųỻịgǻňạcểơửṩ, Dec 13 '19 at 17:52
Don't rely on the output of `str`, which will "pretify" your number. Try using `repr`. For example, that gives me `2.2250738585072019e-308` on my machine. — juanpa.arrivillaga, Dec 13 '19 at 17:54
@juanpa.arrivillaga Thanks for your answer. Neither `print(repr(realminSucc))` nor `print(realminSucc)` solved it for me. I wonder why it gives two different results on our machines. Realmin is printed with 16 digits for fractional part and I don't understand why it only shows 15 digits when printing the sum. All it does is adding the digit 5 to the least significant digit of realmin. Why does it get rounded up? — zcb, Dec 13 '19 at 18:31
@Prune: This is not a duplicate of that question. This one asks about a specific situation, and the answer involves particular behaviors of the formatted conversion to decimal, not the general floating-point behavior discussed in that question. — Eric Postpischil, Dec 13 '19 at 19:38
Hmmm ... okay. I conflate the results, but you have a good point. — Prune, Dec 13 '19 at 19:43
I have an answer drafted that shows why an extra digit is needed for `realmin` than for `sum` but cannot currently post it because this question was promiscuously closed as a duplicate of [that question](https://stackoverflow.com/questions/588004/is-floating-point-math-broken), which is general and does not address the conversion method apparently used in this case. — Eric Postpischil, Dec 13 '19 at 20:00
@EricPostpischil I think the question is open now. If not, is there anything I can do to reopen it in order to give you the possibility to answer? — zcb, Dec 13 '19 at 20:27

score 0 · Accepted Answer · answered Dec 14 '19 at 00:07

Python is not strict about floating-point behavior, so some of the following is speculative—it depends on the implementation.

Java and JavaScript require the default conversion of floating-point values to strings to use just enough decimal digits to uniquely distinguish the floating-point value. For example, if the representable values in some floating-point format were 3, 3.0625, 3.125, 3.1875, and so on, then converting 3.0625 to a string yields “3.06” because that uniquely distinguishes it from 3 and 3.125, and it must be that long because the shorter “3.1” does not distinguish it from 3.125. But converting 3.125 to a string yields “3.1” because that is enough for it; converting 3.1 to the nearest representable value yields 3.125.

Because Java and JavaScript require this, subroutines for doing those conversions are becoming common, and a Python implementation might use them since they are readily available. This behavior would explain the results you see in your Python implementation.

Although the question states “2^-1074 = 5e-24”, this is not true. 2⁻¹⁰⁷⁴ is exactly 4.940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625 • 10⁻³²⁴. The exact values of the floating-point numbers matter in the formatting. Near 2⁻¹⁰²², the representable values are:

2⁻¹⁰²² − 2⁻¹⁰⁷⁴ = 2.2250738585072008890245868760858598876504231122409594654935248025624400092282356951787758888037591552642309780950434312085877387158357291821993020294379224223559819827501242041788969571311791082261043971979604000454897391938079198936081525613113376149842043271751033627391549782731594143828136275113838604094249464942286316695429105080201815926642134996606517803095075913058719846423906068637102005108723282784678843631944515866135041223479014792369585208321597621066375401613736583044193603714778355306682834535634005074073040135602968046375918583163124224521599262546494300836851861719422417646455137135420132217031370496583210154654068035397417906022589503023501937519773030945763173210852507299305089761582519159720757232455434770912461317493580281734466552734375 • 10⁻³⁰⁸.
2⁻¹⁰²² = 2.225073858507201383090232717332404064219215980462331830553327416887204434813918195854283159012511020564067339731035811005152434161553460108856012385377718821130777993532002330479610147442583636071921565046942503734208375250806650616658158948720491179968591639648500635908770118304874799780887753749949451580451605050915399856582470818645113537935804992115981085766051992433352114352390148795699609591288891602992641511063466313393663477586513029371762047325631781485664350872122828637642044846811407613911477062801689853244110024161447421618567166150540154285084716752901903161322778896729707373123334086988983175067838846926092773977972858659654941091369095406136467568702398678315290680984617210924625396728515625 • 10⁻³⁰⁸.
2⁻¹⁰²² + 2⁻¹⁰⁷⁴ = 2.2250738585072018771558785585789482407880088486837041956131300312119688603996006965297904292212628858639037013670281908017171296072711910355127227413175152199055740043138804567803233377539881639177387328959246074229270113078053813397081653361296447449529789521218979090783852583365901851789618799885150427514782636076021680436220311292700454832073964845713103912225963935608322440623896907276890186717054549275173986589324810401738228328251245795065655738191038008646911615828719989708647293221449796971546706720399791990809160347625980385995424739847678861180095072511543762389603716215171729816011544604359531284325406441938645324905389137795680915804792405099227413854274942620542640408839836919187418172987793340279242767544565229087538682506419718265533447265625 • 10⁻³⁰⁸.
2⁻¹⁰²² + 2•2⁻¹⁰⁷⁴ = 2.225073858507202371221524399825492417356801716905076560672932645536733285985283197205297699430014751163740063003020570598281825052988921962169433097257311618680370015095758583081036528065392691763555900744906711111645647364804112062758171723538798309937366264595295182248000398368305570577036006227080633922504922164288936230661591439894977428478987977026639696679140794688312373772389232659678427752122018252042155806801495766953982188063736129641369100312575820243717972293621169304087413797478551780397864281278268544917722045363748655580517781818995617950934297749406849316597964346304638590078974833882923081797242441461636291003104968899481242069589385613709015202152589845793237400783350172912858237869043043055848553508913045817507736501283943653106689453125 • 10⁻³⁰⁸.

Now we can see why 2⁻¹⁰²² must be displayed as “2.2250738585072014e-308”. If it were displayed with one fewer digit, as “2.225073858507201e-308”, that would be closer to 2⁻¹⁰²² − 2⁻¹⁰⁷⁴ than to 2⁻¹⁰²², so it would be wrong.

However, for 2⁻¹⁰²² + 2⁻¹⁰⁷⁴, “2.225073858507202e-308” suffices because the closest representable value to that is 2⁻¹⁰²² + 2⁻¹⁰⁷⁴. 2⁻¹⁰²² + 2•2⁻¹⁰⁷⁴ is further away.

After reading your answer I think that's exactly what's happening in my case, so thank you for taking the time. I'm curious: what did you use to calculate the exact values? — zcb, Dec 18 '19 at 22:01
@zcb: I used C compiled with Clang on macOS. Apple put in some work to ensure the C displays of floating-point values are correct (when you request enough digits). For example, you can show one of the numbers above with `#include ` / `int main(void) { printf("%.9999g\n", 0x1p-1022 + 0x1p-1074); }`. — Eric Postpischil, Dec 18 '19 at 22:17

Why does Python round this sum?

1 Answers1