0

This question is kind of long-detailed and took a lot of my time; hope you read it till end; thanks for your patience

I've find out that size of data in some programming languages like python isn't really matter, and they can store very big values like huge numbers; and then increase them to something even bigger.
I work with c and I know in that language, variables must have a particular type and size. To store huge integers (which is very bigger than the largest type in c), I know 2 different ways:

First method: Using char pointer and store each digit at particular index

  • First value of pointer can tell us the sign.
  • Pointer can stop at particular value (like \0 in strings but now we use $).

to store 123456789 pointer would be like this:

++++++++++++++++++++++++++++++++++++++++++++
| + | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9| $ |
++++++++++++++++++++++++++++++++++++++++++++
  ^                                      ^
  |                                      |
 sign                                 finisher

Second method: Using unsigned int pointer and store numbers like computers store binaries (I mean store them based on size of int)

  • First value of pointer can tell us the sign
  • Second value of pointer can tell us the length

to store 123456789 pointer would be like this:

// 123456789 MOD 2^32 = 123456789
+++++++++++++++++++++++++++++++++++
|     +     |     1     |123456789|
+++++++++++++++++++++++++++++++++++
  ^               ^           ^
  |               |           |
 sign           length      value

and to store something bigger like 9998877665544332211 the pointer would be as in below:

// 9998877665544332211 / 2^32 = 2328045122
// 9998877665544332211 MOD 2^32 = 2942002099
+++++++++++++++++++++++++++++++++++++++++++++++
|     +     |     2     |2942002099|2328045122|
+++++++++++++++++++++++++++++++++++++++++++++++
  ^               ^           ^         ^
  |               |           |         |
 sign           length        +---------+
                                   |
                                values

For simplification I don't use only first value of pointer to keep both sign and size.

issues:

  • If I use First method, it will waste a lot of memory (e.g. for 18-digit First method needs 20 bytes and Second method needs 16 bytes; the difference will be greater in bigger values).

  • Second method needs more calculation if we want print it as decimal.

Questions:
Which method you suggest me to use?
Which method is more similar to that one languages like python use?
Is there any better method?

PS: Yeah, using struct can make it easier, but that's not my point; I just wanna know storing each digit separately is better than storing number based on int's size or not. Please don't refer me to external libraries like GMP. I wanna know what such libraries internally do.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
Tim
  • 3
  • 2
  • 1
    Obviously if you are going to do arithmetic, storing one digit per element is going to be a lot less efficient. You can see that from the trivial example `1234 * 5678` when storing 1 digit per element will require 16 multiplications and more additions with the basic 'longhand' method, but a larger radix will only require 1 operation. – Weather Vane Aug 05 '22 at 18:20
  • 1
    There are quite a few 'big number' libraries. However, asking about them is officially off-topic on SO. GCC uses [GNU MultiPrecision (GMP)](https://gmplib.org/), and [MPC](https://www.multiprecision.org/mpc/) and [MPFR](https://www.mpfr.org/), for example. You can perform searches for similar names to find your own options. – Jonathan Leffler Aug 05 '22 at 18:30
  • 1
    How long is a piece of string? It depends on how much and what computation you need to do. On the one hand a power-of-2 radix can ease the computation, but a compromise is to use 10^9 radix (with 32-bit elements) making the conversion to and from a decimal string quite easy. – Weather Vane Aug 05 '22 at 18:31
  • *I wanna know what such libraries internally do.* Set off and write one! – Weather Vane Aug 05 '22 at 18:35
  • Storing as individual decimal digits is pretty much the worst of both worlds. Yes, it's easy to input and print out as decimal, but it's hugely inefficient in both space and time. If you store an array of bigger integers, as you've shown, the question is, what's the "base"? You showed doing things mod 2^32, which makes for maximally efficient storage and computation, but difficulty converting to/from base 10. But you could also store things mod 10^9, which is just about as efficient in space and speed, and also easy to convert to/from base 10. – Steve Summit Aug 05 '22 at 18:37
  • This is like a variant of [Binary Coded Decimal](https://en.wikipedia.org/wiki/Binary-coded_decimal) that was popular in the 1950s, but it was proven to be a giant mess to scale up and so was abandoned. – tadman Aug 05 '22 at 19:01
  • @tadman And your point is? – Steve Summit Aug 05 '22 at 21:37
  • @SteveSummit That this approach has a name and history behind it. – tadman Aug 05 '22 at 22:37
  • @tadman Well, it may have been abandoned for hardware arithmetic, but it's clearly alive and well for software multiprecision arithmetic, and I'm not aware that there's anything wrong with it, or that there are any viable alternatives! (Tim's "first method" is indeed BCD, but the second is more like Binary Coded Base-4294967296.) – Steve Summit Aug 05 '22 at 22:46
  • @Tim You can see some stripped-down examples of doing this sort of thing in the answers to [this question](https://stackoverflow.com/questions/72013383) and [this question](https://stackoverflow.com/questions/66109785), and in [this question](https://stackoverflow.com/questions/70277512). Also in [this strange answer](https://stackoverflow.com/questions/69586409#69590952). – Steve Summit Aug 05 '22 at 22:52
  • What's is your purpose of these numbers? What is the required precision? If precision isn't a big deal, use floating point numbers as intermediary and convert from/to integer. – meaning-matters Aug 05 '22 at 23:04

2 Answers2

3

Neither is ideal. For both storage and speed the best way (in a generic context) is having your number binary encoded in 2s complement in an array of bits grouped by the largest data size of the underlying architecture (e.g. unsigned long long for x64 or unsigned int for x86).

more calculation if we want print it as decimal.

Unless you have a very specific use case, most of the time is spent in calculations between bignums. To and from decimal conversion is usually done only on IO which is slow anyway so it doesn't matter that much. Slower conversion to and from decimal is an acceptable tradeoff in favor of bignum calculation performance.

I definitely don't recommend building your own bignum library except for academic purposes (which can be an excellent exercise if you are into this). Beyond difficulty of implementation, making it fast requires deep knowledge and vast experience in system architectures, operating systems and compilers. Just use an existing proven library like bignum, tiny-bignum, GMP etc.

wanna know what such libraries internally do.

Most of them are open source so ... go look around!

bolov
  • 72,283
  • 15
  • 145
  • 224
  • Absolutely right on the difficulties — but I would add that it is an *excellent* academic exercise! (I've got a homebrew bignum library I play with, and while it's definitely a gratuitous wheel reinvention and will never come anywhere close to the performance of GMP, it's tons of fun and I've learned a lot from writing it. 121883461705416278634058248038432770474079746486049) – Steve Summit Aug 05 '22 at 18:42
  • @SteveSummit absolutely – bolov Aug 05 '22 at 18:52
2

What you're looking for is arbitrary precision arithmetic or "bignum". It rapidly gets complicated to do it efficiently and is way too much to explain in an answer here.

Is there any better method?

Yes, use an existing library.

Which method is more similar to that one languages like python use?

libmpedc is what Python uses. Ruby will use GMP if available.

As an example, this is the decimal struct from libmpedc.

#include <mpdecimal.h>

typedef struct mpd_t {
       uint8_t flags;       // [memory flags] | [specials] | sign
       mpd_ssize_t exp;     // exponent
       mpd_ssize_t digits;  // number of decimal digits
       mpd_ssize_t len;     // number of words in use
       mpd_ssize_t alloc;   // number of allocated words
       mpd_uint_t *data;    // words
} mpd_t;

It's using unsigned integers, but also a lot of other tricks.

I would suggest poking around in their source code to learn more. Particularly mpd_set_string (creating a decimal from a string) and the basic arithmetic functions.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • That's what Python's `decimal` module uses. Python's native integers and rationals use an idiosyncratic bignum implementation based on 30- or 15-bit binary units (depending on the platform). There's also a [Python binding for GMP](https://pypi.org/project/gmpy2/). – rici Aug 05 '22 at 20:27