Numbers bigger than 8 bytes in C

Question

I am writing some code to deal with numbers in C which are bigger than 8 bytes in size (don't fit into unsigned long). In this example I will use 16 bytes (128 bits) as the width. The numbers are unsigned and integers (no decimal places). They are stored as an array of unsigned chars eg:

unsigned char n[16];

I have managed to get addition to work (it works like an unsigned number in C so if you had a number which was 0xffffffffffffffffffffffffffffffff (2**128) and you were to add 1 you would get 0. I have managed to get addition to work, but I cannot get subtraction to work. I would assume it would be similar code to addition, but I don't seem to be able to get it to work.

Addition code:

//a and b are numbers
unsigned char *add(unsigned char *a, unsigned char *b){
    unsigned char *c = malloc(NUM_SIZE);
    //d is the carry and c is the output number
    unsigned short d = 0;

    if(!c){
        return NULL;
    }
    for(int i = 0; i < NUM_SIZE; i++){
        c[i] = 0;
    }
    for(int i = NUM_SIZE * 2 - 1; i >= 0; i--){
        d += a[i % NUM_SIZE] + b[i % NUM_SIZE];
        c[i % NUM_SIZE] = d % 256;
        d >>= 8;
    }
    return c;
}

NUM_SIZE is defined as 16 (the width of the number in bytes)

What I have tried:

//changing the signs to minuses
d -= a[i % NUM_SIZE] - b[i % NUM_SIZE];

//changing the some signs to minuses
d -= a[i % NUM_SIZE] + b[i % NUM_SIZE];
//or
d += a[i % NUM_SIZE] - b[i % NUM_SIZE];

//looping through the number backwards
for(int i = 0; i < NUM_SIZE * 2; i++)

Have you tried anything for the substraction? Can we see it please? — Silveris, Jun 08 '17 at 15:46
Why `*2` in `NUM_SIZE * 2` and not `for(int i = NUM_SIZE - 1; i >= 0; i--){`? — chux - Reinstate Monica, Jun 08 '17 at 15:51
Substraction will be very similar to the addition, the main thing that will change will be the carry. — Silveris, Jun 08 '17 at 15:52
Is `a[0]` or `a[NUM_SIZE-1]` the the intended least significant byte? — chux - Reinstate Monica, Jun 08 '17 at 15:54
It's quite hard to implement huge integers correctly. It's just primary school maths, but the details are fiddly. — Malcolm McLean, Jun 08 '17 at 15:55
@MalcolmMcLean: Actually clever bignum algorithms (notably for multiplication & division) exist, and they are much more difficult &clever than primary school. — Basile Starynkevitch, Jun 08 '17 at 15:59
Consider what code `for(int i = NUM_SIZE * 2 - 1; i >= 0; i--){ ...c[i % NUM_SIZE] = d % 256;` does. It sets twice each element of `c[]`, overwriting whatever it did the first time. — chux - Reinstate Monica, Jun 08 '17 at 16:39
You need to do the 2's complement and then addition manually. — aiao, Jun 09 '17 at 08:52

score 4 · Answer 1 · answered Jun 08 '17 at 15:54

4

You may want to use arbitrary-precision arithmetic, a.k.a. as bigint or bignum. You should use a library for that (because bignum algorithms are very clever and use some assembler code). I recommend GMPlib. See also this.

answered Jun 08 '17 at 15:54

Basile Starynkevitch

223,805
18
296
547

zdf · Accepted Answer · 2017-06-09T11:15:02.340

4

Just an idea (not compiled):

void not( unsigned char* a, unsigned int n )
{
  for ( unsigned int i = 0; i < n; ++i )
    a[i] = ~a[i];
}

void inc( unsigned char* a, unsigned int n )
{
  for ( unsigned int i = 0; i < n; ++i )
    if ( ++a[i] )
      return;
}

void add( unsigned char* c, unsigned char* a, unsigned char* b, unsigned int n )
{
  for ( unsigned int i = 0, r = 0; i < n; ++i )
    c[i] = r = a[i] + b[i] + ( r >> 8 );
}

void sub( unsigned char* c, unsigned char* a, unsigned char* b, unsigned int n )
{
  not( b, n );
  add( c, a, b, n );
  not( b, n ); // revert
  inc( c, n );
}

edited Jun 09 '17 at 11:15

answered Jun 08 '17 at 16:55

zdf

4,382
3
18
29

1

Clean and simple! The type of `i` should be the same as that of `n`, preferably `size_t`. – chqrlie Jun 09 '17 at 07:08
The code assumes 8-bit bytes. You might want to use type `uint8_t` for the arrays, or a larger type such as `uint16_t` or `uint32_t` for fewer iterations. – chqrlie Jun 09 '17 at 07:09
1

@chqrlie You are right, however I am always trying to avoid over pedantic answers. Thank you. – zdf Jun 09 '17 at 11:15

chux - Reinstate Monica · Answer 3 · 2017-06-08T16:43:29.630

NUM_SIZE * 2 does not make sense with malloc(NUM_SIZE); ... for(int i = NUM_SIZE * 2 - 1. Only a loop of NUM_SIZE iterations is needed.

Repaired code

#define NUM_SIZE 8
//a - b
unsigned char *sub(const unsigned char *a, const unsigned char *b) {
  unsigned char *c = malloc(NUM_SIZE);
  if (!c) {
    return NULL;
  }

  // zeroing `c[]` not needed.  Retain that code if desired

  int d = 0;  // Use signed accumulator to save the "borrow"

  // drop *2
  for (int i = NUM_SIZE - 1; i >= 0; i--) {
    d += a[i] - b[i];                // Perform the subtraction
    c[i] = d;                        // Save the 8 least significant bits in c[]
    d = (d - c[i]) / (UCHAR_MAX+1);  // Form the "borrow" for the next loop
  }
  // If d<0 at this point, b was greater than a
  return c;
}

Various performance improvements can be made, but important to get functionality correct first.

it is needed for the unsigned char effect where numbers overflow back around (255 + 1 = 0) — dangee1705, Jun 08 '17 at 16:03
@DanielGee Code needs to be concerned about carry/borrow and over/underflow. Using `NUM_SIZE * 2` is not the best approach. — chux - Reinstate Monica, Jun 08 '17 at 16:09

cmdLP · Answer 4 · 2017-06-08T16:52:05.143

There may be some __int128_t. But if your compiler does not support it you define a struct with hi and lo with the biggest type you have. In c++ you can also add operators similar to the operators you know from the other int_t-s.

typedef struct uint128 {
    uint64_t lo, hi; // lo comes first if you want to use little-endian else hi comes first
} uint128_t;

If you want to double the size, you use uint128_t in a similar struct.

Edit: A simple function to increase the int128:

int128_t& int128_increase(int128_t& value) {
    // increase the low part, it is 0 if it was overflown
    // so increase hi
    if (!(++value.lo)) {
        ++value.hi;
    };
    return value;
};

Edit: A runtime scaled version of ints, I use words, because it is faster in accessing memory:

typedef struct uint_dynamic {
    // the length as a multiple of the wordsize
    size_t length;
    size_t* words;
} uint_dynamic_t;

uint_dynamic_t& uint_dynamic_increase(uint_dynamic_t& value) {
    size_t* ptr = value.words; size_t i = value.length;
    while(i && !(++*ptr)) { ++ptr; --i; };
    return value;
};

Or if you want some constant size, put it clearly into a struct.

#define uint_fixed_SIZE (16 / sizeof(size_t))
typedef struct uint_fixed {
    size_t words[uint_fixed_SIZE];
} uint_fixed_t;

uint_fixed_t& uint_fixed_increase(uint_fixed_t& value) {
    size_t* ptr = value.words; size_t i = uint_fixed_SIZE;
    while(i && !(++*ptr)) { ++ptr; --i; };
    return value;
};

This can be rewritten as a #define-macro, where you replace the specific values by a parameter. Which has similar functionality, by defining specific values and including a file:

File fixed_int.h

// note that here is no #ifndef FILE_H or #pragma once
// to reuse the file

#define _concat1(a, b) a ## b
#define _concat(a, b) _concat1(a, b)
#define _size (-((-fixed_int_size) / sizeof(size_t) / 8))
#ifndef fixed_int_name
    #define _name concat(uint_, fixed_int_size)
#else
    #define _name fixed_int_name
#endif
#define _name_(member) _concat(_concat(_name, _), member)

typedef struct _name {
    size_t words[_size];
} _name_(t);

_name_(t)& _name_(increase)(_name_(t)& value) {
    size_t* ptr = value.words; size_t i = _size;
    while(i && !(++*ptr)) { ++ptr; --i; };
    return value;
};

// undef all defines!
#undef _concat1
#undef _concat
#undef _size
#undef _name
#undef _name_

File my_ints.h

//...

// the following lines define the type uint128_t and the function uint_128_t& uint128_increase(uint128_t&)
#define fixed_int_name uint128 // is optional
#define fixed_int_size 128
#include"fixed_int.h"
#undef fixed_int_size
#undef fixed_int_name

//...

unfortunatelly this wouldn't work as i may not just be using 128 bits i may use more (which is why I have NUM_SIZE) but thanks — dangee1705, Jun 08 '17 at 15:52

score 1 · Answer 5 · answered Jun 08 '17 at 16:48

Numbers have a "base" that determines the range of each digit (e.g. "base 10" is decimal).

One uint8_t is a single digit in "base 256". One uint16_t is a single digit in "base 65536". One uint32_t is a single digit in "base 4294967296".

For mathematical operations, performance is heavily effected by the number of digits. By using a larger base you need fewer digits for the same number, which improves performance (until you exceed the CPU's native word size).

For subtraction of unsigned numbers:

#define DIGITS 4

int subtract(uint32_t *result, uint32_t *src1, uint32_t *src2) {
    int carry = 0;
    int oldCarry;
    int i;

    for(i = 0; i < DIGITS; i++) {
        oldCarry = carry;
        if(src2[i] < src1[i]) {
            carry = 1;
        } else if( (src2[i] == src1[i]) && (oldCarry != 0) ) {
            carry = 1;
        } else {
            carry = 0;
        }
        result[i] = src1[i] - src2[i] - oldCarry;
    }
    return carry;
}

Curious, any particular reason for using `uint32_t` rather than `unsigned` concerning "improves performance (until you exceed the CPU's native word size"? — chux - Reinstate Monica, Jun 08 '17 at 17:47
@chux: Partly habit, and partly because it makes it easier later (for printing the numbers, creating static numbers, etc) — Brendan, Jun 08 '17 at 18:02

Numbers bigger than 8 bytes in C

5 Answers5