1

I'm working with 64 bits floating point arithmetic as defined by IEEE 754. The smallest subnormal number is:
2^-1074 = 5e-324 = 5 * 10^-16 * 10^-308

Adding the latter to realmin results in:
2^-1022 + 2^-1074 = 2.2250738585072014 * 10^-308 + 5 * 10^-16 * 10^-308 = (2.2250738585072014 + 0.0000000000000005) * 10^-308 = 2.2250738585072019 * 10^-308

When performing the addition in Python the result is slightly different. Here's the simple script:

import numpy as np

realmin = np.power(2.0, -1022)
print( "realmin\t\t" + str(realmin) )

smallestSub = np.power(2.0, -1074)
print( "smallest sub\t" + str(smallestSub) )

realminSucc = realmin + smallestSub
print( "sum\t\t" + str(realminSucc) )

The output is:

realmin         2.2250738585072014e-308
smallest sub    5e-324
sum             2.225073858507202e-308

Why does it rounds the sum? There's space for one extra digit as shown by realmin output.

Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
zcb
  • 87
  • 1
  • 13
  • 1
    What's the Python version that you're using? – Arn Dec 13 '19 at 17:50
  • All terms are rounded. All operations with floating point numbers result in rounding, since floating point can only hold up to around 15 decimal digits of accuracy – Ṃųỻịgǻňạcểơửṩ Dec 13 '19 at 17:52
  • Don't rely on the output of `str`, which will "pretify" your number. Try using `repr`. For example, that gives me `2.2250738585072019e-308` on my machine. – juanpa.arrivillaga Dec 13 '19 at 17:54
  • @Arn I'm using Python 3.7 – zcb Dec 13 '19 at 18:22
  • @juanpa.arrivillaga Thanks for your answer. Neither `print(repr(realminSucc))` nor `print(realminSucc)` solved it for me. I wonder why it gives two different results on our machines. Realmin is printed with 16 digits for fractional part and I don't understand why it only shows 15 digits when printing the sum. All it does is adding the digit 5 to the least significant digit of realmin. Why does it get rounded up? – zcb Dec 13 '19 at 18:31
  • @Prune: This is not a duplicate of that question. This one asks about a specific situation, and the answer involves particular behaviors of the formatted conversion to decimal, not the general floating-point behavior discussed in that question. – Eric Postpischil Dec 13 '19 at 19:38
  • Hmmm ... okay. I conflate the results, but you have a good point. – Prune Dec 13 '19 at 19:43
  • I have an answer drafted that shows why an extra digit is needed for `realmin` than for `sum` but cannot currently post it because this question was promiscuously closed as a duplicate of [that question](https://stackoverflow.com/questions/588004/is-floating-point-math-broken), which is general and does not address the conversion method apparently used in this case. – Eric Postpischil Dec 13 '19 at 20:00
  • @EricPostpischil I think the question is open now. If not, is there anything I can do to reopen it in order to give you the possibility to answer? – zcb Dec 13 '19 at 20:27

1 Answers1

0

Python is not strict about floating-point behavior, so some of the following is speculative—it depends on the implementation.

Java and JavaScript require the default conversion of floating-point values to strings to use just enough decimal digits to uniquely distinguish the floating-point value. For example, if the representable values in some floating-point format were 3, 3.0625, 3.125, 3.1875, and so on, then converting 3.0625 to a string yields “3.06” because that uniquely distinguishes it from 3 and 3.125, and it must be that long because the shorter “3.1” does not distinguish it from 3.125. But converting 3.125 to a string yields “3.1” because that is enough for it; converting 3.1 to the nearest representable value yields 3.125.

Because Java and JavaScript require this, subroutines for doing those conversions are becoming common, and a Python implementation might use them since they are readily available. This behavior would explain the results you see in your Python implementation.

Although the question states “2^-1074 = 5e-24”, this is not true. 2−1074 is exactly 4.940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625 • 10−324. The exact values of the floating-point numbers matter in the formatting. Near 2−1022, the representable values are:

  • 2−1022 − 2−1074 = 2.2250738585072008890245868760858598876504231122409594654935248025624400092282356951787758888037591552642309780950434312085877387158357291821993020294379224223559819827501242041788969571311791082261043971979604000454897391938079198936081525613113376149842043271751033627391549782731594143828136275113838604094249464942286316695429105080201815926642134996606517803095075913058719846423906068637102005108723282784678843631944515866135041223479014792369585208321597621066375401613736583044193603714778355306682834535634005074073040135602968046375918583163124224521599262546494300836851861719422417646455137135420132217031370496583210154654068035397417906022589503023501937519773030945763173210852507299305089761582519159720757232455434770912461317493580281734466552734375 • 10−308.
  • 2−1022 = 2.225073858507201383090232717332404064219215980462331830553327416887204434813918195854283159012511020564067339731035811005152434161553460108856012385377718821130777993532002330479610147442583636071921565046942503734208375250806650616658158948720491179968591639648500635908770118304874799780887753749949451580451605050915399856582470818645113537935804992115981085766051992433352114352390148795699609591288891602992641511063466313393663477586513029371762047325631781485664350872122828637642044846811407613911477062801689853244110024161447421618567166150540154285084716752901903161322778896729707373123334086988983175067838846926092773977972858659654941091369095406136467568702398678315290680984617210924625396728515625 • 10−308.
  • 2−1022 + 2−1074 = 2.2250738585072018771558785585789482407880088486837041956131300312119688603996006965297904292212628858639037013670281908017171296072711910355127227413175152199055740043138804567803233377539881639177387328959246074229270113078053813397081653361296447449529789521218979090783852583365901851789618799885150427514782636076021680436220311292700454832073964845713103912225963935608322440623896907276890186717054549275173986589324810401738228328251245795065655738191038008646911615828719989708647293221449796971546706720399791990809160347625980385995424739847678861180095072511543762389603716215171729816011544604359531284325406441938645324905389137795680915804792405099227413854274942620542640408839836919187418172987793340279242767544565229087538682506419718265533447265625 • 10−308.
  • 2−1022 + 2•2−1074 = 2.225073858507202371221524399825492417356801716905076560672932645536733285985283197205297699430014751163740063003020570598281825052988921962169433097257311618680370015095758583081036528065392691763555900744906711111645647364804112062758171723538798309937366264595295182248000398368305570577036006227080633922504922164288936230661591439894977428478987977026639696679140794688312373772389232659678427752122018252042155806801495766953982188063736129641369100312575820243717972293621169304087413797478551780397864281278268544917722045363748655580517781818995617950934297749406849316597964346304638590078974833882923081797242441461636291003104968899481242069589385613709015202152589845793237400783350172912858237869043043055848553508913045817507736501283943653106689453125 • 10−308.

Now we can see why 2−1022 must be displayed as “2.2250738585072014e-308”. If it were displayed with one fewer digit, as “2.225073858507201e-308”, that would be closer to 2−1022 − 2−1074 than to 2−1022, so it would be wrong.

However, for 2−1022 + 2−1074, “2.225073858507202e-308” suffices because the closest representable value to that is 2−1022 + 2−1074. 2−1022 + 2•2−1074 is further away.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • After reading your answer I think that's exactly what's happening in my case, so thank you for taking the time. I'm curious: what did you use to calculate the exact values? – zcb Dec 18 '19 at 22:01
  • @zcb: I used C compiled with Clang on macOS. Apple put in some work to ensure the C displays of floating-point values are correct (when you request enough digits). For example, you can show one of the numbers above with `#include ` / `int main(void) { printf("%.9999g\n", 0x1p-1022 + 0x1p-1074); }`. – Eric Postpischil Dec 18 '19 at 22:17