How to compute huge numbers with python?

Question

I'm currently trying to find the value of x,

x = (math.log(X) - math.log(math.fabs(p))/math.log(g))

with :

X = 53710695204323513509337733909021562547350740845028323195225592059762435955297110591848019878050853425581981564064692996024279718640577281681757923541806197728862534268310235863990001242041406600195234734872865710114622767319497082014412908147635982838670976889326329911714511434374891326542317244606912177994106645736126820796903212224

p = 79293686916250308867562846577205340336400039290615139607865873515636529820700152685808430350565795397930362488139681935988728405965018046160143856932183271822052154707966219579166490625165957544852172686883789422725879425460374250873493847078682057057098206096021890926255094441718327491846721928463078710174998090939469826268390010887

g = 73114111352295288774462814798129374078459933691513097211327217058892903294045760490674069858786617415857709128629468431860886481058309114786300536376329001946020422132220459480052973446624920516819751293995944131953830388015948998083956038870701901293308432733590605162069671909743966331031815478333541613484527212362582446507824584241

Unfortunately python can't handle natively such big numbers.

Does somebody know a way to solve this ?

Thank you for your help.

EDIT

Since a lot you wonder what i'm trying to do:

To be able to communicate in a secured fashion Alice and Bob proceed to a Diffie-Hellman key exchange. To that end they use the prime number p :

p = 79293686916250308867562846577205340336400039290615139607865873515636529820700152685808430350565795397930362488139681935988728405965018046160143856932183271822052154707966219579166490625165957544852172686883789422725879425460374250873493847078682057057098206096021890926255094441718327491846721928463078710174998090939469826268390010887

And the integer g:

g = 73114111352295288774462814798129374078459933691513097211327217058892903294045760490674069858786617415857709128629468431860886481058309114786300536376329001946020422132220459480052973446624920516819751293995944131953830388015948998083956038870701901293308432733590605162069671909743966331031815478333541613484527212362582446507824584241

Alice chooses the secret number x, she calculates X=g^x mod p and sends X through an unsecured channel to Bob. Bob chooses the secret number y, he calculates Y=g^y mod p and sends Y through the same unsecured channel to Alice. Both can calculate the value Z = X^y = Y^x = g^xy mod p

By spying on the channel, Charlie retrieves the value of X and Y:

X = 53710695204323513509337733909021562547350740845028323195225592059762435955297110591848019878050853425581981564064692996024279718640577281681757923541806197728862534268310235863990001242041406600195234734872865710114622767319497082014412908147635982838670976889326329911714511434374891326542317244606912177994106645736126820796903212224

Y = 17548462742338155551984429588008385864428920973169847389730563268852776421819130212521059041463390276608317951678117988955994615505741640680466539914477079796678963391138192241654905635203691784507184457129586853997459084075350611422541722123509121359133932497700621300814065254996649070135358792927275914472632707420292830992294921992

The key of this exercice is the md5sum of the value of Z

Are the result and the intermediate values integers? Presumably not, but your inputs (in particular `p`) *are* and you apply `fabs` to one of them... — , Feb 21 '14 at 21:55
NumPy won't handle this any better than vanilla Python. In fact, it'll break down sooner due to different long->float conversion handling. — user2357112, Feb 21 '14 at 22:00
If they're integers, python *does* handle such big numbers. arbitrary large in fact. If you need floating points, you should try the `Decimal` module, or `mpmath` — M4rtini, Feb 21 '14 at 22:17
Please name your Python version and give a runnable code snippet that reproduces the error(s) you see. — , Feb 21 '14 at 22:27

score 8 · Accepted Answer · answered Feb 21 '14 at 21:57

8

You can do this with the Decimal library:

from decimal import Decimal

X = 53710695204323513509337733909021562547350740845028323195225592059762435955297110591848019878050853425581981564064692996024279718640577281681757923541806197728862534268310235863990001242041406600195234734872865710114622767319497082014412908147635982838670976889326329911714511434374891326542317244606912177994106645736126820796903212224
p = 79293686916250308867562846577205340336400039290615139607865873515636529820700152685808430350565795397930362488139681935988728405965018046160143856932183271822052154707966219579166490625165957544852172686883789422725879425460374250873493847078682057057098206096021890926255094441718327491846721928463078710174998090939469826268390010887
g = 73114111352295288774462814798129374078459933691513097211327217058892903294045760490674069858786617415857709128629468431860886481058309114786300536376329001946020422132220459480052973446624920516819751293995944131953830388015948998083956038870701901293308432733590605162069671909743966331031815478333541613484527212362582446507824584241
X=Decimal(X)
p=Decimal(p)
g=Decimal(g)

print X.ln() - abs(p).ln()/g.ln()

gives

769.7443428855116199351294830

answered Feb 21 '14 at 21:57

Peter de Rivaz

33,126
4
46
75

2

Vanilla python gives me the exact same result. – LucasB Feb 21 '14 at 22:08
Vanilla python gives same result, and is about 900x faster than the decimal version. In python 2.7 – M4rtini Feb 21 '14 at 22:20
@LucasB: What does "vanilla" Python even mean? The OP is trying to use the `math` module included with Python, and this answer uses the `decimal` module included with Python. So... what are *you* using to get this result? – John Y Feb 21 '14 at 22:20
@LucasB There's an error of about 8e-15 (after converting the float result to `Decimal` so they can be subtracted), which precludes "exact same" but is almost certainly negligible. `Decimal` has the theoretical advantage (not taken advantage of in this answer) of allowing greater precision. -1 for these reasons. – Feb 21 '14 at 22:21
@JohnY I took it to mean "using floats and `math`, like in the question". – Feb 21 '14 at 22:22
2

@LucasB -- "Vanilla" for me did not work, using Python 2.6.5. As-is threw "OverflowError: long int too large to convert to float". Adding ".0" to all ints to pre-convert to floats resulted in no exceptions, but "nan" result. – DreadPirateShawn Feb 21 '14 at 22:22
1

To clarify: vanilla python3.3 using the `math` module and not using `fabs`, but `abs` since `fabs` makes no sense given what the OP tells us. – LucasB Feb 21 '14 at 22:24
@DreadPirateShawn In Python 3.3, `math.log(huge_int)` works and gives the correct result (but `float()` does throw an overflow error as you describe). Presumably this was added somewhere between 2.6 and 3.0? I haven't seen any mention of this in the docs. – Feb 21 '14 at 22:27
1

To anyone using the term "vanilla Python": If you have to import exactly one **standard library** module in either case, how is one more vanilla than the other??? – John Y Feb 21 '14 at 22:27
@JohnY Maybe they mean they are using CPython and not a different implementation? Then I don't get why use "vanilla" which takes 7 more keystrokes. – Bakuriu Feb 21 '14 at 22:29
@delnan What is `math(huge_int)`? `math` is a module, that should raise a `TypeError`... – Bakuriu Feb 21 '14 at 22:29
@JohnY: using builtin types (`long` in this instance) rather than objects of a class defined in a module is definitely more vanilla imo. – LucasB Feb 21 '14 at 23:02

Dietrich · Answer 2 · 2014-02-22T10:57:38.107

Sympy might be interesting for you. You can do symbolic simplifications and adjust the precision you want to use (using mpmath):

import sympy as sy
sy.init_printing() # enable pretty printing in IPython

# Build the expression:
X,p,g = sy.symbols('X,p,g')
expr = (sy.log(X) - sy.log(sy.Abs(p))/sy.log(g))
# expr = expr.simplify()  # doesn't have any benefit in this case

# The values:
vX = 53710695204323513509337733909021562547350740845028323195225592059762435955297110591848019878050853425581981564064692996024279718640577281681757923541806197728862534268310235863990001242041406600195234734872865710114622767319497082014412908147635982838670976889326329911714511434374891326542317244606912177994106645736126820796903212224
vp = 79293686916250308867562846577205340336400039290615139607865873515636529820700152685808430350565795397930362488139681935988728405965018046160143856932183271822052154707966219579166490625165957544852172686883789422725879425460374250873493847078682057057098206096021890926255094441718327491846721928463078710174998090939469826268390010887
vg = 73114111352295288774462814798129374078459933691513097211327217058892903294045760490674069858786617415857709128629468431860886481058309114786300536376329001946020422132220459480052973446624920516819751293995944131953830388015948998083956038870701901293308432733590605162069671909743966331031815478333541613484527212362582446507824584241

# substitute values into variables:
expr2 = expr.subs({X:vX, p:vp,g:vg})

# evaluate to 150 digits with internal precision up to 1000 digits:
print(expr2.evalf(n=150, maxn=1000))

gives as a result:

    769.744342885511619935129482917192487900343653888850271462255718268257261969359878869753342583593581927254506121925469662801405523964742213571689617098

Update: As noted by casevh and David, when using sympy, attention is to be paid to not losing accuracy by using normal floating point numbers as inputs. To clarify, let's calculate 10**log10(10+1e-30), which obviously results in 10+1e-30:

import sympy as sy
import numpy as np

xf = 1e-30

# numpy with floats:
np_x1 = np.log10(10+ xf)
np_yf = 10**np_x1

# sympy with no extra benefit
sy1_x1 = sy.log(10 + xf) / sy.log(10)
sy1_ye = 10**sy1_x1
sy1_yf = sy1_ye.evalf(n=33)

# sympy, done right:
x = sy.symbols('x')
sy2_x1 = sy.log(10 + x) / sy.log(10)
sy2_ye = 10**sy2_x1
sy2_yf = sy2_ye.evalf(n=33, subs={x:xf})

print("correct answer: 10.0000000000000000000000000000010")
print("        numpy:  {:.31f}".format(np_yf))
print("  naive sympy:  " + repr(sy1_yf))
print("correct sympy:  " + repr(sy2_yf))

gives as result:

correct answer: 10.0000000000000000000000000000010
        numpy:  10.0000000000000000000000000000000
  naive sympy:  10.0000000000000017763568394002504
correct sympy:  10.0000000000000000000000000000010

Setting precision to 1000 doesn't really give you 1000 digits of precision. Really, anything past 16 is meaningless. Reference my answer below which cites the sympy documentation. — David Marx, Feb 22 '14 at 02:14
This value is actually correct. The input values to evalf() are all integers so there is no precision loss due to conversion to a 64-bit float. — casevh, Feb 22 '14 at 04:25

score 0 · Answer 3 · answered Feb 22 '14 at 02:12

The "machine epsilon" for the numpy float64 dtype is about 2.2e-16. This means that you should only expect the first 16 significant digits in your result to be accurate unless you are doing something extremely tricky (like rolling your own custom data type).

Dietrich suggested using sympy and setting a high internal precision, but the sympy documentation confirms that this does not escape the machine epsilon issue.

some Python floats are only accurate to about 15 digits as inputs, while others (those that have a denominator that is a power of 2, like .125 = 1/4) are exact.

In other words, be extremely wary of the results you get working with numbers as large as yours, because only the first 16 significant digits will be meaningful as long as you are using 64 bit floats to do your math.

Dietrich's answer is precisely correct to the specified precision. The sentence you quoted only applies when trying to perform high-precision calculations on values that have already been converted to Python floats. Since the inputs to evalf() are integers, there is no loss of precision. — casevh, Feb 22 '14 at 04:23

How to compute huge numbers with python?

3 Answers3

Linked