4

A bit of background:

I am an amateur programmer, having picked up Haskell a few months ago, on my spare time, after a period of Mathematica programmning (my first language). I am currently going through my second Haskell book, by Will Kurt, but I still have miles to go to call myself comfortable around Haskell code. Codeabbey has been my platform for experimentation and learning so far.

I have written a piece of code to generate permutations of a given number, that deals with possible duplicate numbers, so for 588 it will internally generate 588, 858 and 885.

However, because I want to scale to pretty big input numbers (think perhaps even a hundred digits long), I don't want to output the whole list and then perform calculations on it, instead every number that is generated is checked on the spot for a certain property and if it has it, well, we have a winner, the number is returned as output and there's no need to go through the rest of the humongous list. If sadly no desired number is found and we unsuccessfully go through all possible permutations, it outputs a "0".

I have also opted to make it a command line program to feed values to it via gnu parallel for faster work.

So here is the code

import System.Environment

import Data.List 

toDigits :: Integer -> [Integer]
toDigits n = map (\n -> read [n]) (show n)

fromDigits :: Integral a => [a] -> Integer
fromDigits list = fromDigitsHelperFunction list 0

fromDigitsHelperFunction :: Integral a => [a] -> Integer -> Integer
fromDigitsHelperFunction [] acc = acc
fromDigitsHelperFunction (x:[]) acc = (fromIntegral x) + acc
fromDigitsHelperFunction digits@(x:xs) acc = fromDigitsHelperFunction xs (acc + ((fromIntegral x) * 10 ^((length digits) - 1 )))

testPermutationsWithRepetition :: ([Integer],Int,[Int],[(Int,Integer)]) -> [Integer]
testPermutationsWithRepetition (digits, index, rotationMap, registeredPositions)
   | index == 0 && rotationMap !! index == 0                                    = [0,0,0] --finish state (no more recursion). Nothing more to do
   | index == digitsLength - 1 && beautyCheck (fromDigits digits)           = digits
   | index == digitsLength - 1                                                  = testPermutationsWithRepetition (digits, index-1, rotationMap, registeredPositions)
   | not ((index,digits!!index) `elem` registeredPositions)                     = testPermutationsWithRepetition (digits, index+1, rotationMap, (index,digits!!index):registeredPositions)
   | rotationMap!!index == 0                                                    = testPermutationsWithRepetition (digits, index-1, restoredRotMap, restoredRegPositions)
   | rotationMap!!index > 0 && (index,digits!!index) `elem` registeredPositions = testPermutationsWithRepetition (shiftLDigits, index, subtractRot, registeredPositions)
   where digitsLength = length digits
         shiftLDigits = (fst splitDigits) ++ (tail $ snd splitDigits) ++ [head $ snd splitDigits]
         splitDigits = splitAt index digits
         restoredRotMap = (fst splitRotMap) ++ [digitsLength - index] ++ (tail $ snd splitRotMap)
         splitRotMap = splitAt index rotationMap
         restoredRegPositions = filter (\pos -> fst pos < index) registeredPositions  --clear everything below the parent index
         subtractRot = (fst splitRotMap) ++ [(head $ snd splitRotMap) - 1] ++ (tail $ snd splitRotMap)


--Frontend function for testing permutations by inputting a single parameter (the number in digit form)
testPermsWithRep :: [Integer] -> [Integer]
testPermsWithRep digits = testPermutationsWithRepetition (digits, 0, [length $ digits, (length $ digits) -1 .. 1], [])

main :: IO ()
main = do
   args <- getArgs
   let number = read (head args) :: Integer
   let checkResult = fromDigits $ testPermsWithRep $ toDigits number
   print checkResult

It's really a sequential process with an index variable that points to a certain number on the digit list and performs a recursive call on that list based on my rules. The functions tracks its progress through the digit list for visited numbers in certain positions so far (to avoid repetition following already visited paths until it gets to the last digit (index == length -1). If the number that we get there passes the beauty check, it exits with the number produced.

Now, in a Mathematica (or I guess any imperative language) I would probably implement this with a While loop and Cases for its checks, and by the logic of the program, however long it took to compute (generate the permutations and check them for validity) it would take a moderate amount of memory, just enough to hold the list of "registeredPositions" really (you could call it the record of visited digits in specific positions, so it's a variable list as we go deeper in index but gets cleaned up as we move back up). However in this case, the recursive calls stack up as it seems and the whole thing acts as a fork bomb for sufficiently large numbers (e.g 27777772222222222222222223333) and eventually crashes. Is this behaviour something that can be handled differently in Haskell or is there no way to avoid the recursion and memory hogging?

I really like Haskell because the programs make logical sense, but I would like to use it also for cases like this where performance (and resources) matters.

As a side note, my brother pointed to this Algorithm to print all permutations with repetition of numbers in C that is reasonably fast (only generates a list though) and most importantly has minimal memory footprint, although I can tell there's also recursion used in it. Other that that I'm clueless when it comes to C and I would like to stick to Haskell, if it can do what I want at the end of the day, that is.

Any help is welcome. Have a good day!

Edit: Per Soleil's suggestion I update my post with additional info provided in the comments. Specifically:

  1. After compiling with "ghc checking_program.hs" I run the program with "./checking program 27777772222222222222222223333". On an i5 3470 with 4GB RAM it runs for about 10 minutes and exits with a segmentation fault. On my brothers 32GB machine he let it run until it took up 20GB of RAM. No need to go further I guess. My tests were on Ubuntu via Win10 WSL. His is bare Linux

  2. testPermsWithRep is just a front end for testPermutationsWithRepetition, so that I can only provide the number and testPermsWithRep creates the initial parameters and calls testPermutationsWithRepetition with those. It outputs exactly what testPermutationsWithRepetition outputs, either a number (in digit form) that passes the test, or [0,0,0]. Now the test, the beautyCheck function is simply a test for single digit divisors of that number, that returns True or False. I didn't include it because it really is inconsequential. It could even be just a "bigger than x number" test.

An an example, calling "testPermsWithRep [2,6,7,3]" will call "testPermutationsWithRepetition ([2,6,7,3], 0, [4,3,2,1],[])" and whatever comes out of that function, testPermsWithRep will return that as well.

  • so what command exactly did you run and what error exactly did it give you? – Will Ness May 09 '21 at 09:16
  • After compiling with "ghc checking_program.hs" I run with "./checking program 27777772222222222222222223333". On an i5 3470 with 4GB RAM it runs for about 10 minutes and exits with a segmentation fault. On my brothers 32GB machine he let it run until it took up 20GB of RAM. No need to go further I guess. My tests were on Ubuntu via Win10 WSL. His bare Linux – Κωστής Καρβουνιάρης May 09 '21 at 09:28
  • `testPermsWithRep` accepts a list of digits representing an integer number, correct? what does it output, a permutation of those digits, of the same length? that passes some test? what is that test? please clarify. – Will Ness May 09 '21 at 09:33
  • testPermsWithRep is just a front end for testPermutationsWithRepetition, so that I can only provide the number and testPermsWithRep creates the initial parameters and calls testPermutationsWithRepetition with those. It outputs exactly what testPermutationsWithRepetition outputs, either a number (in digit form) that passes the test, or [0,0,0]. Now the test, the beautyCheck function is simply a test for single digit divisiors of that number, that returns True or False. I didn't include it because it really is inconsequential. It could even be just a "bigger than x number" test. – Κωστής Καρβουνιάρης May 09 '21 at 09:41
  • So, there's no alternative in Haskell? It's recursion all the way and nothing else? So it's all principles and no performance computing? If that's so, I 'd be disappointed. Also, I don't know OCaml or F#, and taking the plunge on the hopes it "might" work better is far fetched imo – Κωστής Καρβουνιάρης May 09 '21 at 10:34
  • *You can also call Haskell's GC or tune it to "collect" more often or so.* How would I go about doing that? Is it a flag I'm supposed to use when compiling? – Κωστής Καρβουνιάρης May 09 '21 at 10:43
  • About GC http://blog.ezyang.com/2011/04/the-haskell-heap/ and https://hackage.haskell.org/package/base-4.15.0.0/docs/System-Mem.html – Soleil May 09 '21 at 10:47
  • 1
    have patience, hopefully someone will answer this. :) in any case of course F# is very different from Haskell. one thing that jumps from your code is your use of list indexing, `!!`, and that is never a good sign performance-wise. nor is it an idiomatic Haskell way to do almost anything. – Will Ness May 09 '21 at 11:17
  • ok, so what are the numbers that go into the test? permutations of the original with duplicate digits allowed? please clarify. – Will Ness May 09 '21 at 11:20
  • and if so, try `main = print . head . (++ [[0,0,0]]) . filter theTest . permutations $ yourInputIntegerNumer`. see https://hoogle.haskell.org/?hoogle=permutations or https://hackage.haskell.org/package/base-4.15.0.0/docs/Data-List.html#v:permutations. – Will Ness May 09 '21 at 11:22
  • *permutations of the original with duplicate digits allowed?* Exactly that. So, if I call the program with 2778 it will go through 12 permutations of that number (4! / 2!), like [2778, 2787 .. 8277] and check each of them. But as I said earlier it will exit as soon as it finds the first permutation that passes the test – Κωστής Καρβουνιάρης May 09 '21 at 11:26
  • 1
    ok, so forget your code, try mine. you might need to `import Data.List (permutations)` first. see how is _that code_ performing. if it runs in constant memory, then Haskell is OK, and it;s just your code that has some problems. see also https://www.google.com/search?q=run+permutations+in+constant+memory+haskell – Will Ness May 09 '21 at 11:29
  • I actively avoided the "permutations" function because it doesn't take duplicate digits into account. If I run "permutations [2,7,7,8] on ghci it will return a list of 24 (4!) results, not 12. The numbers I'm going to check are guaranteed to have repeated digits, lots of them, and the difference gets huge the more digits there are in the input number. That's a LOT of duplicate work. That's why I got into the trouble of writing an algorithm myself. I couldn't find (or perhaps understand) sth similar on stackexchange – Κωστής Καρβουνιάρης May 09 '21 at 11:34
  • *see also google.com/search?q=run+permutations+in+constant+memory+haskell* Thanks for the suggestion! I will look into it – Κωστής Καρβουνιάρης May 09 '21 at 11:37
  • Could you include some examples of input, processing, and the corresponding output, so that it's a bit easier to understand the algorithm you're using? – AMC May 11 '21 at 14:09
  • @AMC Try it on paper and you'll get it. Calling the function for, say 588, will call the main function with ([5,8,8],0,[3,2,1],[]). Then apply the rules one by one. For example, for step one, index will become +1 and (0,5) will be added to registeredPositions. Then it will be called again with ([5,8,8],1,[3,2,1],[(0,5)]). And so on. Better still to see my revised code. For 588 it takes 27 "steps" to finish, if memory serves, but you won't need that many to understand where this is going – Κωστής Καρβουνιάρης May 13 '21 at 06:56

4 Answers4

5

The performance issue with your program doesn't have anything to do with recursion. Rather, you seem to be running up against an accumulation of a partially evaluated, lazy data structure in your rotation map. Your program will run in constant memory if you use the deepseq package to fully force evaluation of the restoredRotMap:

-- Install the `deepseq` package and add this import
import Control.DeepSeq

-- And then change this one case
... | rotationMap!!index == 0 = restoredRotMap `deepseq`
      testPermutationsWithRepetition (digits, index-1, restoredRotMap, restoredRegPositions)

Compiled with ghc -O2 and using beautyMap _ = False, this runs with a fixed resident memory usage of about 6 megs.

Some other performance targets:

  • You might want to replace most of your Integer types with Int, as this will be faster. I think you only need Integer for the input to toDigits and the output of fromDigits, and everything else can be Int, since it's all indexes and digits.
  • An even bigger win will be to replace your rotation map and registered positions with better data structures. If you find yourself splicing up lists with lots of listpart1 ++ [x] ++ listpart2 calls, there are going to be enormous performance costs to that, and the linear lookups with (!!) aren't helping either.
K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71
  • 1
    obligatory mention of "list zippers" i.e. the list split pairs with the first part of the split maintained in reverse. – Will Ness May 13 '21 at 12:58
3

So I am not 100% sure of this and I am also not 100% sure I understand your code. But as far as I understand you are generating permutations without duplicates and then you are checking for some predicate wanting whatever single number that fulfils it.

I think it should help to use as many of the prelude functions as possible because afaik then the compiler understands it can optimize recursion into a loop. As a rule of thumb I was taught to avoid explicit recursion as much as possible and instead use prelude functions like map, filter and fold. Mainly you avoid reinventing the wheel this way but there also should be a higher chance of the compiler optimizing things.

So to solve your problem try generating a list of all permutations, then filter it using filter and then just do take 1 if you want the result that is found first. Because of Haskell's lazy evaluation take 1 makes it so that we are interested only in the first x in (x:xs) that a filter would return. Therefore filter will keep dropping elements from the, again lazily evaluated, list of permutations and when it finds one it stops.

I found a permutation implementation on https://rosettacode.org/wiki/Permutations#Haskell and used it to try this call:

take 1 $ filter ((> 67890123456789012345) . fromDigits) $ permutations' $ toDigits 12345678901234567890

it has been running for like 20 minutes now and RAM usage has stayed around 230 MB.
I hope that has answered/helped you at least in some way.

+ a bonus tip: you can simplify your fromDigits to this beautiful thing:

fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl shiftAndAdd 0
    where shiftAndAdd acc d = 10 * acc + fromIntegral d

EDIT:
I read some more of the comments and I see you care about ignoring duplicates but I am afraid you'll have to go smarter about that, since if I understand correctly your implementation still generates all the duplicates it only throws them away after checking if they are in a list (which has O(n) complexity). And when you only care about finding one permutation that fits your predicate you drop the not fitting ones anyway.

And people have already correctly pointed out that !! is generally also very bad.

lordQuick
  • 53
  • 5
  • Thanks for your reply! * if I understand correctly your implementation still generates all the duplicates it only throws them away* Not at all, no. If it did it wouldn't be any different from "permutations" of Data.List. On the contrary, it only creates unique permutations, not even going down the wrong paths. That's what registredPositions is for and it's not a list of permutations of the input number, it's a list of tuples of the kind (index, digit), like saying "I've encountered this digit at this index, don't go down that road again". It's a breadcrumbs kind of approach – Κωστής Καρβουνιάρης May 09 '21 at 14:05
  • I get that laziness in Haskell means that "take 1" of whatever list will stop the computation, but that's assuming there is a solution withing that list. If there isn't, I will needlessly go through a list of n! items, rechecking duplicates for a solution nowhere to be found, instead of a list of (n! / any-number-of-duplicates!). Again huge difference. So for 2778 my code will check exactly 12 permutations, not more. I'm not trying to be smart about a subject I know little about, I'm just stating what makes sense based on what I've learned so far about the language. I will try your code though – Κωστής Καρβουνιάρης May 09 '21 at 14:12
  • Also, what is an efficient way to replace "!!"? How else do I pick a certain element from a list? – Κωστής Καρβουνιάρης May 11 '21 at 06:08
  • 1
    @ΚωστήςΚαρβουνιάρης regarding `(!!)`, it is not really that you need to "pick" it differently, it *generally* means that your approach should be different - e.g. you should be using `map`. If you're using a list as a look-up table it may be ok, although for those purposes there are other data structures like [HashMap](https://hackage.haskell.org/package/unordered-containers) or vectors. Since Haskell list is like a linked list - not made for accessing single elements in the middle. If you need to hold more than just a few elements, it may be worth of looking into the other options. – lordQuick May 11 '21 at 18:18
  • @ΚωστήςΚαρβουνιάρης regarding permutations, I misunderstood what the `registredPositions` holds - that's actually a pretty clever solution! However, rather than a list of tuples I am already thinking of a vector of sets, where for every digit you store a set of indices that it has already been at (a vector of 10 sets of `Int`s). That would make it more efficient. But perhaps there is a even better solution. I realize I am not very helpful but I would also suggest trying to look into how you could maybe use things like `map`, `zipWith`, etc. or list comprehension more. – lordQuick May 11 '21 at 18:35
1

Thanks to everyone for your helpful answers and comments.

@lordQuick permuations used with filter is still terrible but that fromDigits code is a beauty, so I used it.

@k-a-buhr That's exactly what I did yesterday, also per others suggestion, I replaced all use of !! and ++. When I did the latter all memory problems disappeared. Wow! I mean I knew ++ is bad I just didn't realise how bad! We're talking orders of magnitude bad! 3M of RAM vs several GB. Also, valid point about integers. I will try that.

Oh, also a very important thing. I replaced recursive calls with until. This is the approach I would have followed in Mathematica (a NestWhile function to be exact), and I'm glad I found it in Haskell. It seemed to make things a bit faster too.

Anyway, the revised code, that solves my memory issues is here for anyone if interested.

{-compiled with "ghc -Rghc-timing -O2 checking_program_v3.hs"-}

import System.Environment
import Data.List


--A little help with triples
fstOfThree (a, _, _) = a
sndOfThree (_, b, _) = b
thrOfThree (_, _, c) = c

--And then some with quads
fstOfFour (a, _, _, _) = a
sndOfFour (_, b, _, _) = b
thrOfFour (_, _, c, _) = c

--This function is a single pass test for single digit factors
--It will be called as many times as needed by pryForSDFactors
trySingleDigitsFactors :: (Bool, Integer, [Integer]) -> (Bool, Integer, [Integer])
trySingleDigitsFactors (True, n, f) = (True, n, f)
trySingleDigitsFactors (b, n, []) = (b, n, [])
trySingleDigitsFactors (b, n, (f:fs))
   | mod n f == 0 = (True, div n f, fs)
   | otherwise = trySingleDigitsFactors (False, n, fs)

--This function will take a number and repeatedly divide by single digits till it gets to a single digit if possible
--Then it will return True
pryForSDFactors :: Integer -> Bool
pryForSDFactors n
   | sndOfThree sdfTry < 10 = True
   | fstOfThree sdfTry == True = pryForSDFactors $ sndOfThree sdfTry
   | otherwise = False
   where sdfTry = trySingleDigitsFactors (False, n, [7,5,3,2])

toDigits :: Integer -> [Integer]
toDigits n = map (\n -> read [n]) (show n)

fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl shiftAndAdd 0
    where shiftAndAdd acc d = 10 * acc + fromIntegral d

replaceElementAtPos :: a -> Int -> [a] -> [a]
replaceElementAtPos newElement pos [] = []
replaceElementAtPos newElement 0 (x:xs) = newElement:xs
replaceElementAtPos newElement pos (x:xs) = x : replaceElementAtPos newElement (pos-1) xs

checkPermutationsStep :: ([Integer],Int,[Int],[(Int,Integer)]) -> ([Integer],Int,[Int],[(Int,Integer)])
checkPermutationsStep (digits, index, rotationMap, registeredPositions)
   | index == digitsLength - 1                                               = (digits, index-1, rotationMap, registeredPositions)
   | not ((index, digitAtIndex) `elem` registeredPositions)                  = (digits, index+1, rotationMap, (index,digitAtIndex):registeredPositions)
   | rotationAtIndex == 0                                                    = (digits, index-1, restoredRotMap, restoredRegPositions)
   | rotationAtIndex > 0 && (index, digitAtIndex) `elem` registeredPositions = (shiftLDigits, index, subtractRot, registeredPositions)
   where digitsLength = length digits
         digitAtIndex = head $ drop index digits
         rotationAtIndex = head $ drop index rotationMap
         --restoredRotMap = (fst splitRotMap) ++ [digitsLength - index] ++ (tail $ snd splitRotMap)
         restoredRotMap = replaceElementAtPos (digitsLength - index) index rotationMap
         --splitRotMap = splitAt index rotationMap
         restoredRegPositions = filter (\pos -> fst pos < index) registeredPositions  --clear everything below the parent index
         shiftLDigits = (fst splitDigits) ++ (tail $ snd splitDigits) ++ [head $ snd splitDigits]
         splitDigits = splitAt index digits
         --subtractRot = (fst splitRotMap) ++ [(head $ snd splitRotMap) - 1] ++ (tail $ snd splitRotMap)
         subtractRot = replaceElementAtPos (rotationDigitAtIndex - 1) index rotationMap
         rotationDigitAtIndex = head $ drop index rotationMap

checkConditions :: ([Integer],Int,[Int],[(Int,Integer)]) -> Bool
checkConditions (digits, index, rotationMap, registeredPositions)
   | (index == 0 && rotationAtIndex == 0) || ((index == (length digits) - 1) && pryForSDFactors (fromDigits digits)) = True
   | otherwise = False
   where rotationAtIndex = head $ drop index rotationMap

testPermsWithRep :: Integer -> Integer
testPermsWithRep n
   | sndOfFour computationResult == 0 && (head . thrOfFour) computationResult == 0 = 0
   | otherwise = (fromDigits . fstOfFour) computationResult
   where computationResult = until checkConditions checkPermutationsStep (digitsOfn, 0 , [digitsLength, digitsLength -1 .. 1], [])
         digitsOfn = toDigits n
         digitsLength = length digitsOfn

main :: IO ()
main = do
   args <- getArgs
   let inputNumber = read (head args) :: Integer
   let checkResult = testPermsWithRep inputNumber
   print checkResult

Now, bear in mind that this code, as I've mentioned, checks for a condition of each generated permutation (single digit factors) on the spot, and moves on if False, but it's pretty easy to repurpose it for output list generation.

Sure it's now just inefficient in terms of big O complexity (scales terribly), and I was at first thinking of replacing lists with Data.Map because that's what I've learned so far (though not so comfortable with maps yet).

I've also read that there's a more efficient replacement for read since that's also called a lot for numbers-to-digits conversions.

@ lordQuick I don't know about HashMaps or vectors yet but I'm still learning. Every little optimization will pay off in computation time because this is my first piece of "practical" code, not just Codeabbey credit

Cheers!

  • please give one sample `inputNumber` and the intended output produced from it. so far anything I tried had produced `0`. – Will Ness May 13 '21 at 13:08
  • @will-ness For the condition I've set, any power of 2 (or permutation of its digits) will do. Try for example 6875410 (a permutation of 2^20). The ```checkCondition``` function will break execution either when all unique permutations are tested and no match is found, outputting a 0, or a number is found that has all single digit factors, in which case that number is returned. – Κωστής Καρβουνιάρης May 13 '21 at 20:54
1

Here is a solution using a more efficient, insertion-based algorithm to compute unique permutations:

import Data.List

permutationsNub :: Eq a => [a] -> [[a]]
permutationsNub = foldr (concatMap . insert) [[]]
    where insert y = foldr combine [[y]] . (zip <*> tail . tails)
              where combine (x, xs) xss = (y : x : xs) :
                        if y == x then [] else map (x :) xss

headDef :: a -> [a] -> a
headDef x [] = x
headDef x (h : t) = h

fromDigits :: Integral a => [a] -> Integer
fromDigits = foldl1' ((+) . (10 *)) . map fromIntegral

toDigits :: Integer -> [Int]
toDigits = map (read . pure) . show

firstValidPermutation :: (Integer -> Bool) -> Integer -> Integer
firstValidPermutation p =
    headDef 0 .
    filter p .
    map fromDigits .
    permutationsNub .
    toDigits

The basic idea is that, given the unique permutations of a list's tail, we can compute the unique permutations of the whole list by inserting its head into all of the tail's permutations, in every position that doesn't follow an occurrence of the head (to avoid creating duplicates). From my tests, permutationsNub seems to be faster than permutations from Data.List even when the input contains no repetitions. However, unlike that function, it consumes its input eagerly and thus cannot handle an infinite input. Exercise: Prove this algorithm's correctness.

to be continued

peter pun
  • 384
  • 1
  • 8