6

I am trying generate numbers of the form 2^m*3^n*5^l where m, n, and l are natural numbers including 0.

The sequence follows: 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, .....

I am testing it by getting the one millionth number. I implemented it using list comprehension and sorting, but it takes too long. I want a faster solution. I have been spending days trying to do this to no avail.

I do not want a complete solution. I just want to know what Haskell concepts are necessary in accomplishing it.

David Grayson
  • 84,103
  • 24
  • 152
  • 189
Ian Hsl
  • 61
  • 3
  • So, can you just make a function that takes `m`, `n`, and `l` as arguments and returns the number or do you need something more complicated? Are you trying to efficiently generate a sequence of all numbers that have that form and you want them to be in increasing order? – David Grayson Apr 20 '22 at 23:10
  • Yes, I am generating the numbers in increasing order using an infinite list. – Ian Hsl Apr 20 '22 at 23:14
  • 2
    See: [New state of the art in unlimited generation of Hamming sequence](https://stackoverflow.com/questions/12480291/new-state-of-the-art-in-unlimited-generation-of-hamming-sequence) – David Lukas Apr 21 '22 at 05:09
  • see https://stackoverflow.com/questions/55636005/trouble-understanding-visualising-sicp-streams-hamming-numbers-program/55808095#55808095 – Will Ness Apr 22 '22 at 19:25

3 Answers3

5

Here's an approach that doesn't need any Haskell concepts, just some math and computer science.

Grab a library that offers priority queues.

Initialize a priority queue containing only the number 1.

Loop the following indefinitely: extract the minimum value from the queue. Put it next in the output list. Insert that number times 2, 3, and 5 as three individual entries in the queue. Make sure the queue insert function merges duplicates, because there will be a lot of them thanks to commutativity of multiplication.

If you have a maximum you're working up to, you can use it to prune insertions to the queue as a minor optimization. Alternatively, you could take advantage of actual Haskell properties and just return an infinite list using laziness.

Carl
  • 26,500
  • 4
  • 65
  • 86
  • 1
    This is the first thing that occurred to be and it should work. It seems like the size of the priority queue grows like O(N^3) where N is the number of elements you want to generate. The memory for the queue will eventually stop fitting in the CPU cache and go into RAM. So I suspect it's slower in practice than other solutions that use O(1) memory. – David Grayson Apr 21 '22 at 00:20
  • Well, I'm probably wrong about the O(N^3) thing but still it does grow pretty large. – David Grayson Apr 21 '22 at 00:29
  • Yeah, this is approach is better suited to things that have a less approachable pattern. – Carl Apr 21 '22 at 00:30
  • As to the size of the queue - it's going to be holding numbers between the one and it's returning and 5x that number. That's O(n) queue entries as overhead. Still way worse than the others, but not terrible. – Carl Apr 21 '22 at 00:33
  • 2
    the size of the leftover queue [appears to grow as `~N^(2/3)`](https://rosettacode.org/wiki/Hamming_numbers#Explicit_multiples_reinserting), with `N` numbers produced. (multiplying by 5 refers to the number's magnitude -- something completely different). this means N log N time overall at best. Dijkstra's and equivalents' time is linear, since it does not overproduce and its internal queue points backwards and has constant size of 3 elements. – Will Ness Apr 22 '22 at 05:51
  • I think I proved the N^(2/3) claim to myself once, though I don't remember it now. (could make nice question on math exchange / overflow ?) but empirical evidence strongly supports it. – Will Ness Apr 22 '22 at 05:58
2

First, write a function of type Int -> Bool that dermines if a given integer is in the sequence you defined. It would divide the number by 2 as many times as possible (without creating a fraction), then divide it by 3 as many times as possible, and finally divide it by 5 as many times as possible. After all of this, if the number is larger than 1, then it cannot be expressed as a products of twos, threes, and fives, so the function would return false. Otherwise, the number is in your sequence, so the function returns true.

Then take the infinite sequence of integers greater than 0, and use the function above to filter out all numbers that are not in the sequence.

David Grayson
  • 84,103
  • 24
  • 152
  • 189
  • 1
    The 10000th number of the list is 288555831593533440, so will have to wait a bit until you reach it with filtering. – j.p. Apr 21 '22 at 21:38
1

Carl's approach can be improved by inserting less elements when removing the minimal element x: As 2<3<4<5<6 you can just

  • append 3*x/2 if x is even but not divisible by 4
  • append 4*x/3 if x is divisible by 3
  • append 5*x/4 if x is divisible by 4
  • append 6*x/5 if x is divisible by 5

In code it looks like this:

g2 x | mod x 4 == 0 = [5*div x 4]
     | even x       = [3*div x 2]
     | otherwise    = []
g3 x | mod x 3 == 0 = [4*div x 3]
     | otherwise    = []
g5 x | mod x 5 == 0 = [6*div x 5]
     | otherwise    = []
g x = concatMap ($ x) [g2, g3, g5]

So you if your remove the minimal element x from the priority queue, you have to insert the elements of g x into the priority queue. On my laptop I get the millionth element after about 8 min, even if I use just a list instead of the better priority queue, as the list grows only to a bit more than 10000 elements.

j.p.
  • 1,031
  • 1
  • 18
  • 25
  • I just saw that David Lukas links in his comment to the question to much better answers. – j.p. Apr 21 '22 at 10:15