Generate a stream of all possible numbers `x^k*y^l` for given `k` and `l` in Haskell

Question

Generate the function generateExponents k l, which for given k and l generates a stream of all unique possible numbers x^k*y^l in increasing order. For example generateExponents 2 3 = [1,4,8,9,16,25,27...]

For obvious reasons this doesn't work:

generateExponents k l = sort [x^k*y^l | x <- [1..], y <- [1..]]

Then I tried this, which doesn't work either:

generateExponents k l = [n | n <- [1 ..], n `elem` products n]
  where
    xs n = takeWhile (\x -> x ^ k <= n) [1 ..]
    ys n = takeWhile (\y -> y ^ l <= n) [1 ..]
    products n = liftA2 (*) (xs n) (ys n)

What am I doing wrong?

score 4 · Accepted Answer · answered Feb 06 '23 at 17:30

4

Your algorithm is pretty slow -- it checks every number, and for every number it searches for an appropriate factorization! You can do better by producing an infinite table of answers, and then collapsing the table appropriately. For example, for x^2*y^3, the table looks like:

 x  1    2    3    4    5
y
1   1    4    9   16   25
2   8   32   72  128  200
3  27  108  243  432  675
4  64  256  576 1024 1600
5 125  500 1125 2000 3125

Note two nice features of this table: each row is sorted, and the rows themselves are sorted. This means we can merge them efficiently by simply taking the top-left value, then re-inserting the tail of the first row in its new sorted position. For example, the table above, after emitting 1, would look like:

  4    9   16   25   36
  8   32   72  128  200
 27  108  243  432  675
 64  256  576 1024 1600
125  500 1125 2000 3125

Then, after emitting the top-left value 4:

  8   32   72  128  200
  9   16   25   36   49
 27  108  243  432  675
 64  256  576 1024 1600
125  500 1125 2000 3125

Note how the top row has now become the second row to keep the doubly-sorted property.

This is an efficient way to construct all the right numbers in the right order. Then, the only remaining trick needed is to deduplicate, and for that you can deploy the standard trick map head . group, since duplicates are guaranteed to be next to each other. Here's the full code:

import Data.List

generateExponents' k l = map head . group . go $ [[x^k*y^l | x <- [1..]] | y <- [1..]] where
    go ((x:xs):xss) = x:go (insert xs xss)

It's much, much faster. Compare:

> sum . take 400 $ generateExponents 2 3
5994260
(8.26 secs, 23,596,249,112 bytes)
> sum . take 400 $ generateExponents' 2 3
5994260
(0.01 secs, 1,172,864 bytes)
> sum . take 1000000 {- a million -} $ generateExponents' 2 3
72001360441854395
(6.99 secs, 13,460,753,616 bytes)

answered Feb 06 '23 at 17:30

Daniel Wagner

145,880
9
220
380

Wow, that's amazing! How do you notice such dependencies? I mean, it's obvious that for natural numbers the function x^k*y^l is increasing, yet I would have never thought of representing the data as a table to notice these properties. Is there any written theory in one place on such algorithms? – Kotaka Danski Feb 06 '23 at 17:53
2

Here's another way using `merge` from mergesort instead of `insert`: `generateExponents k l = foldr1 (\(x:xs) ys -> x : merge xs ys) [[x^k*y^l | x <- [1..]] | y <- [1..]]`. – Noughtmare Feb 06 '23 at 18:05
@Noughtmare As a merge sort enjoyer, I like your variant, too. :-D – Kotaka Danski Feb 06 '23 at 18:38
@Noughtmare I'm having trouble finding that function, can you link the docs? (Brilliant solution Daniel and Nought!) – L0neGamer Feb 06 '23 at 19:20
1

@L0neGamer it's not in `base`, but it is very easy to write. You can [find it in `MissingH`](https://hackage.haskell.org/package/MissingH-1.6.0.0/docs/src/Data.List.Utils.html#merge). – Noughtmare Feb 06 '23 at 19:34
2

@KotakaDanski I've spent [a lot of time thinking about such tables](https://stackoverflow.com/a/46900780/791604), and there's well-known algorithms of this kind for [very similar problems](https://stackoverflow.com/q/12480291/791604). – Daniel Wagner Feb 06 '23 at 20:40
3

@Noughtmare If you wanna go full galaxy-brain, you'll want to do a [balanced tree fold](https://gist.github.com/dmwit/81d5f8e99612847a7611a66895863eb8) rather than a right fold. (Not the exact fold in that link, since I think it isn't lazy enough, but using that idea, of merging pairwise lists repeatedly.) – Daniel Wagner Feb 06 '23 at 20:49
@Noughtmare Actually, in my (admittedly very informal) tests, the `insert` version is the fastest! 6.5s for a million elements, vs 8.9s with a tree fold and 19s with `foldr`. – Daniel Wagner Feb 06 '23 at 21:41
2

@DanielWagner My benchmark tells me that my merge version is faster: https://gist.github.com/noughtmare/dd2975ad03ac3ee5ed5cab6365fdf376 – Noughtmare Feb 06 '23 at 22:04
1

@Noughtmare Welp, just goes to show you gotta actually compile with optimizations to know for real. I feel like I've told newbs that enough times that I shouldn't have made that mistake, but, well... here we are. I'm excited that my intuition about `foldb` was good, though; when I turn `n` up to a million in your benchmark, it's ten times faster than the `foldr` version on my machine (this time compiling with optimizations so I don't stick my foot in my mouth a second time...)! – Daniel Wagner Feb 06 '23 at 22:16
1

@L0neGamer actually, about `merge`, you do need a special version that drops duplicates. The one in `MissingH` keeps duplicates. You can find the duplicate-dropping version in the gist I linked above. – Noughtmare Feb 06 '23 at 22:20
@Noughtmare how come duplicates matter? You're comparing infinite lists; either one is greater than the other, or it does not terminate. – L0neGamer Feb 07 '23 at 07:54
@L0neGamer I guess it depends on the original problem description. The code Daniel Wagner shows here assumes duplicates need to be removed (otherwise you could leave out that `map head . group` part. – Noughtmare Feb 07 '23 at 10:24
@L0neGamer Because, for example, 8^2 \* 9^3 = 27^2 \* 4^3. Or, for a simpler but more confusing example, because 2^1 \* 2^1 = 2^1 \* 2^1. – Daniel Wagner Feb 07 '23 at 14:17

score 1 · Answer 2 · answered Feb 06 '23 at 17:01

1

I think you just forgot to map the actual function over the xs and ys:

generateExponents k l = [n | n <- [1 ..], n `elem` products n]
  where
    xs n = takeWhile (<= n) $ map (^ k) [1 ..]
    ys n = takeWhile (<= n) $ map (^ l) [1 ..]
    products n = liftA2 (*) (xs n) (ys n)

answered Feb 06 '23 at 17:01

Noughtmare

9,410
1
12
38

Your suggestion solved it. Thank you! – Kotaka Danski Feb 06 '23 at 17:11

Generate a stream of all possible numbers `x^k*y^l` for given `k` and `l` in Haskell

2 Answers2