We can efficiently generate the sequence in order by merging the appropriate multiples of the sequence of Hamming numbers, that is the classical algorithm.
If n > 1
is a Hamming number divisible by p
, then n/p
is also a Hamming number, and if m
is a Hamming number and p
one of 2, 3, or 5, then m*p
is also a Hamming number.
So we can describe the sequence of Hamming numbers as
H = 1 : (2*H ∪ 3*H ∪ 5*H)
where p*H
is the sorted sequence obtained by multiplying all Hamming numbers with p
, and ∪
means the sorted union (so with H = 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, ...
, e.g. 2*H = 2, 4, 6, 8, 10, 12, 16, 18, 20, 24, ...
and 2*H ∪ 3*H = (2, 4, 6, 8, 10, 12, 16, ...) ∪ (3, 6, 9, 12, 15, ...) = (2, 3, 4, 6, 8, 9, 10, 12, 15, 16, ...)
).
This algorithm has two downsides, though. First, it produces duplicates that must be eliminated in the merging (∪
) step. Second, to generate the Hamming numbers near N
, the Hamming numbers near N/5
, N/3
and N/2
need to be known, and the simplest way to achieve that is to keep the part of the sequence between N/5
and N
in memory, which requires quite a bit of memory for large N
.
A variant that addresses both issues starts with the sequence of powers of 5,
P = 1, 5, 25, 125, 625, 3125, ...
and in a first step produces the numbers having no prime factors except 3 or 5,
T = P ∪ 3*T (= 1 : (5*P ∪ 3*T))
(a number n
having no prime factors except 3 and 5 is either a power of 5 (n ∈ P
), or it is divisible by 3 and n/3
also has no prime factors except 3 and 5 (n ∈ 3*T
)). Obviously, the sequences P
and 3*T
are disjoint, so no duplicates are produced here.
Then, finally we obtain the sequence of Hamming numbers via
H = T ∪ 2*H
Again, it is evident that no duplicates are produced, and to generate the Hamming numbers near N
, we need to know the sequence T
near N
, which requires knowing P
near N
and T
near N/3
, and the sequence H
near N/2
. Keeping only the part of H
between N/2
and N
, and the part of T
between N/3
and N
in memory requires much less space than keeping the part of H
between N/5
and N
in memory.
A rough translation of my Haskell code to C++ (unidiomatic, undoubtedly, but I hardly ever write C++, and the C++ I learned is ancient) yields
#include <iostream>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <gmpxx.h>
class Node {
public:
Node(mpz_class n) : val(n) { next = 0; };
mpz_class val;
Node *next;
};
class ListGenerator {
public:
virtual mpz_class getNext() = 0;
virtual ~ListGenerator() {};
};
class PurePowers : public ListGenerator {
mpz_class multiplier, value;
public:
PurePowers(mpz_class p) : multiplier(p), value(p) {};
mpz_class getNext() {
mpz_class temp = value;
value *= multiplier;
return temp;
}
// default destructor is fine here
// ~PurePowers() {}
};
class Merger : public ListGenerator {
mpz_class multiplier, thunk_value, self_value;
// generator of input sequence
// to be merged with our own output
ListGenerator *thunk;
// list of our output we need to remember
// to generate the next numbers
// Invariant: list is never empty, and sorted
Node *head, *tail;
public:
Merger(mpz_class p, ListGenerator *gen) : multiplier(p) {
thunk = gen;
// first output would be 1 (skipped here, though)
head = new Node(1);
tail = head;
thunk_value = thunk->getNext();
self_value = multiplier;
}
mpz_class getNext() {
if (thunk_value < self_value) {
// next value from the input sequence is
// smaller than the next value obtained
// by multiplying our output with the multiplier
mpz_class num = thunk_value;
// get next value of input sequence
thunk_value = thunk->getNext();
// and append our next output to the bookkeeping list
tail->next = new Node(num);
tail = tail->next;
return num;
} else {
// multiplier * head->val is smaller than next input
mpz_class num = self_value;
// append our next output to the list
tail->next = new Node(num);
tail = tail->next;
// and delete old head, which is no longer needed
Node *temp = head->next;
delete head;
head = temp;
// remember next value obtained from multiplying our own output
self_value = head->val * multiplier;
return num;
}
}
~Merger() {
// delete wrapped thunk
delete thunk;
// and list of our output
while (head != tail) {
Node *temp = head->next;
delete head;
head = temp;
}
delete tail;
}
};
// wrap list generator to include 1 in the output
class Hamming : public ListGenerator {
mpz_class value;
ListGenerator *thunk;
public:
Hamming(ListGenerator *gen) : value(1) {
thunk = gen;
}
// construct a Hamming number generator from a list of primes
// If the vector is empty or contains anything but primes,
// horrible things may happen, I don't care
Hamming(std::vector<unsigned long> primes) : value(1) {
std::sort(primes.begin(), primes.end());
ListGenerator *gn = new PurePowers(primes.back());
primes.pop_back();
while(primes.size() > 0) {
gn = new Merger(primes.back(), gn);
primes.pop_back();
}
thunk = gn;
}
mpz_class getNext() {
mpz_class num = value;
value = thunk->getNext();
return num;
}
~Hamming() { delete thunk; }
};
int main(int argc, char *argv[]) {
if (argc < 3) {
std::cout << "Not enough arguments provided.\n";
std::cout << "Usage: ./hamming start_index count [Primes]" << std::endl;
return 0;
}
unsigned long start, count, n;
std::vector<unsigned long> v;
start = strtoul(argv[1],NULL,0);
count = strtoul(argv[2],NULL,0);
if (argc == 3) {
v.push_back(2);
v.push_back(3);
v.push_back(5);
} else {
for(int i = 3; i < argc; ++i) {
v.push_back(strtoul(argv[i],NULL,0));
}
}
Hamming *ham = new Hamming(v);
mpz_class h;
for(n = 0; n < start; ++n) {
h = ham->getNext();
}
for(n = 0; n < count; ++n) {
h = ham->getNext();
std::cout << h << std::endl;
}
delete ham;
return 0;
}
which does the job without being too inefficient:
$ ./hamming 0 20
1
2
3
4
5
6
8
9
10
12
15
16
18
20
24
25
27
30
32
36
$ time ./hamming 1000000 2
519381797917090766274082018159448243742493816603938969600000000000000000000000000000
519386406319142860380252256170487374054333610204770704575899579187200000000000000000
real 0m0.310s
user 0m0.307s
sys 0m0.003s
$ time ./hamming 100000000 1
181401839647817990674757344419030541037525904195621195857845491990723972119434480014547
971472123342746229857874163510572099698677464132177627571993937027608855262121141058201
642782634676692520729286408851801352254407007080772018525749444961547851562500000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000
real 0m52.138s
user 0m52.111s
sys 0m0.050s
(the Haskell version is faster, GHC can optimise idiomatic Haskell better than I can optimise unidiomatic C++)