14

As an optional assignment, I'm thinking about writing my own implementation of the BigInteger class, where I will provide my own methods for addition, subtraction, multiplication, etc.

This will be for arbitrarily long integer numbers, even hundreds of digits long.

While doing the math on these numbers, digit by digit isn't hard, what do you think the best datastructure would be to represent my "BigInteger"?

At first I was considering using an Array but then I was thinking I could still potentially overflow (run out of array slots) after a large add or multiplication. Would this be a good case to use a linked list, since I can tack on digits with O(1) time complexity?

Is there some other data-structure that would be even better suited than a linked list? Should the type that my data-structure holds be the smallest possible integer type I have available to me?

Also, should I be careful about how I store my "carry" variable? Should it, itself, be of my "BigInteger" type?

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Fifa89
  • 171
  • 1
  • 4
  • 2
    (a) I don't think you should use a linked list, I'm sure some operations will require (or benefit from) random access. Also, linked lists are slow with all the memory allocations. (b) If you use the smallest integer then you'll use the lowest memory, but if you use whatever matches the size of a word (i.e., an `int`) then you'll be fast. So it depends on what your main concern is. An obvious possibility is to make the integer type a template parameter of your class. (c) Check the GNU MP library, you won't be wrong if you copy some of their design decisions. – Manuel Feb 08 '10 at 19:13
  • 1
    Here's the source code of Java's BigInteger class: http://kickjava.com/src/java/math/BigInteger.java.htm – Manuel Feb 08 '10 at 19:44

10 Answers10

3

Check out the book C Interfaces and Implementations by David R. Hanson. It has 2 chapters on the subject, covering the vector structure, word size and many other issues you are likely to encounter.

It's written for C, but most of it is applicable to C++ and/or Java. And if you use C++ it will be a bit simpler because you can use something like std::vector to manage the array allocation for you.

finnw
  • 47,861
  • 24
  • 143
  • 221
  • http://code.google.com/p/cii/source/browse/trunk/src/ap.c http://code.google.com/p/cii/source/browse/trunk/examples/calc.c – slf Feb 08 '10 at 19:49
1

Always use the smallest int type that will do the job you need (bytes). A linked list should work well, since you won't have to worry about overflowing.

captncraig
  • 22,118
  • 17
  • 108
  • 151
  • I would write it in base 2**(machine_word-1) (base 65536 for 32 bit machines). Why waste time processing single bytes, when you can process whole words at once. – Tadeusz A. Kadłubowski Feb 08 '10 at 19:14
  • Overflowing is not really an issue, he can use a vector or a deque which will be both much more efficient than a linked list for this application. – Manuel Feb 08 '10 at 19:16
  • 4
    I refuse to believe that 2**31 is 65536 – Ponkadoodle Feb 08 '10 at 19:29
  • 1
    A `std::list` uses two link pointers per node. On a 64-bit system, that would be 16 bytes of links and 1 byte of data… which gets padded out to 8. The appropriate type for this job is `int`, which may be 64 or 32 (or even 16) bits — the class should detect which and adjust itself if necessary. Also, a linked list is totally inappropriate. – Potatoswatter Feb 08 '10 at 21:54
  • why have a list at all? why not just have 1 integer and 1 counter, when the integer reaches its max value, set it back to 0 and increment the counter...eg: "integer + counter * Max_Integer". this would be the output, as a string. – Anthony Raimondo Apr 17 '13 at 20:21
1

If you use binary trees (whose leaves are ints), you get all the advantages of the linked list (unbounded number of digits, etc) with simpler divide-and-conquer algorithms. You do not have in this case a single base but many depending the level at which you're working.

If you do this, you need to use a BigInteger for the carry. You may consider it an advantage of the "linked list of ints" approach that the carry can always be represented as an int (and this is true for any base, not just for base 10 as most answers seem to assume that you should use... In any base, the carry is always a single digit)

I might as well say it: it would be a terrible waste to use base 10 when you can use 2^30 or 2^31.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
1

Accessing elements of linked lists is slow. I think arrays are the way to go, with lots of bound checking and run time array resizing as needed.


Clarification: Traversing a linked list and traversing an array are both O(n) operations. But traversing a linked list requires deferencing a pointer at each step. Just because two algorithms both have the same complexity it doesn't mean that they both take the same time to run. The overhead of allocating and deallocating n nodes in a linked list will also be much heavier than memory management of a single array of size n, even if the array has to be resized a few times.

mob
  • 117,087
  • 18
  • 149
  • 283
  • 1
    How are they slow? I'm not doing random access for any of these operations, just sequential. This would make the linked-list just as fast as the vector for sequential access and better speed for insertions. – Fifa89 Feb 08 '10 at 19:12
  • @Fifa89 - a linked list is faster for insertions in the middle, but for what you want to do you'll be adding elements at the end, and for that a vector or a deque are just as fast. – Manuel Feb 08 '10 at 19:17
  • @Manuel, I keep seeing you say this, but can you please provide an answer showing it? As far as I know, a linked-list provides O(1) insertion time, and the vector provides an amortized O(1) insertion. Can you elaborate on what makes the vector faster? – Fifa89 Feb 08 '10 at 19:21
  • @Fifa89 - I assumed you would only add elements at the end (with push_back). A linked list will allocate on every push_back whereas a vector will allocate every N push_backs. For such a performance-sensitive class as BigNum this will make a huge difference. – Manuel Feb 08 '10 at 19:30
  • @Fifa89 - Are you sure traversal of a linked list is as fast as the same operation on a vector? I'm sure that for large BigNums a vector will cause much fewer cache misses. – Manuel Feb 08 '10 at 19:42
  • @Fifa89, @Manuel: as long as the number fits in cache, both will be linear time with no cache misses. However, the linked list will be bound by the number of L1 *hits* required to traverse it, as you have to access all the links. Also, the linked list will fill up the cache and cause misses with for a smaller number, because it clogs the cache with pointers. – Potatoswatter Feb 08 '10 at 22:07
1

Wow, there are some… interesting answers here. I'd recommend reading a book rather than try to sort through all this contradictory advice.

That said, C/C++ is also ill-suited to this task. Big-integer is a kind of extended-precision math. Most CPUs provide instructions to handle extended-precision math at comparable or same speed (bits per instruction) as normal math. When you add 2^32+2^32, the answer is 0… but there is also a special carry output from the processor's ALU which a program can read and use.

C++ cannot access that flag, and there's no way in C either. You have to use assembler.

Just to satisfy curiosity, you can use the standard Boolean arithmetic to recover carry bits etc. But you will be much better off downloading an existing library.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • 1
    If he wanted to use an existing library he would of just used the BigInteger class he mentioned in his post. Sometimes people do things to learn and not to put into some production environment. I think this is a good CS assignment and one that every CS student should do at some point. – mmcdole Feb 08 '10 at 22:41
  • @simucal: there's also learning the little assembler you need for extended multiplication & addition, which is educational too… – Potatoswatter Feb 09 '10 at 00:23
0

I would say an array of ints.

fastcodejava
  • 39,895
  • 28
  • 133
  • 186
0

i would say a std::vector of char (since it has to hold only 0-9) (if you plan to work in BCD)

If not BCD then use vector of int (you didnt make it clear)

Much less space overhead that link list

And all advice says 'use vector unless you have a good reason not too'

pm100
  • 48,078
  • 23
  • 82
  • 145
0

An Array is indeed a natural fit. I think it is acceptable to throw OverflowException, when you run out of place in your memory. The teacher will see attention to detail.

A multiplication roughly doubles digit numbers, addition increases it by at most 1. It is easy to create a sufficiently big Array to store the result of your operation.

Carry is at most a one-digit long number in multiplication (9*9 = 1, carry 8). A single int will do.

Tadeusz A. Kadłubowski
  • 8,047
  • 1
  • 30
  • 37
  • Multiplication results in roughly digits_a + digits_b + [0-2] digits in the product...which is not more than double the number in the larger input, but could be considerably less than that. – dmckee --- ex-moderator kitten Feb 08 '10 at 19:14
0

std::vector<bool> or std::vector<unsigned int> is probably what you want. You will have to push_back() or resize() on them as you need more space for multiplies, etc. Also, remember to push_back the correct sign bits if you're using two-compliment.

Inverse
  • 4,408
  • 2
  • 26
  • 35
0

As a rule of thumb, use std::vector instead of std::list, unless you need to insert elements in the middle of the sequence very often. Vectors tend to be faster, since they are stored contiguously and thus benefit from better spatial locality (a major performance factor on modern platforms).

Make sure you use elements that are natural for the platform. If you want to be platform independent, use long. Remember that unless you have some special compiler intrinsics available, you'll need a type at least twice as large to perform multiplication.

I don't understand why you'd want carry to be a big integer. Carry is a single bit for addition and element-sized for multiplication.

Make sure you read Knuth's Art of Computer Programming, algorithms pertaining to arbitrary precision arithmetic are described there to a great extent.

avakar
  • 32,009
  • 9
  • 68
  • 103
  • May I ask why long? Are there any guarantees that this will map to the architecture's word size? – Manuel Feb 08 '10 at 21:16
  • There aren't, it's just an educated guess (it does map to natural word size under msvc and gcc on intel). – avakar Feb 08 '10 at 21:41