Project Euler Question 14 (Collatz Problem)

Question

The following iterative sequence is defined for the set of positive integers:

n ->n/2 (n is even) n ->3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 40 20 10 5 16 8 4 2 1 It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

I tried coding a solution to this in C using the bruteforce method. However, it seems that my program stalls when trying to calculate 113383. Please advise :)

#include <stdio.h>
#define LIMIT 1000000

int iteration(int value)
{
 if(value%2==0)
  return (value/2);
 else
  return (3*value+1);
}

int count_iterations(int value)
{
 int count=1;
 //printf("%d\n", value);
 while(value!=1)
 {
  value=iteration(value);
  //printf("%d\n", value);
  count++;
 }
 return count;
}

int main()
{
 int iteration_count=0, max=0;
 int i,count;


 for (i=1; i<LIMIT; i++)
 {
  printf("Current iteration : %d\n", i);
  iteration_count=count_iterations(i);
  if (iteration_count>max)
   {
   max=iteration_count;
   count=i;
   }

 }

 //iteration_count=count_iterations(113383); 
 printf("Count = %d\ni = %d\n",max,count);

}

See: http://stackoverflow.com/questions/2388254/code-golf-collatz-conjecture — Jonathan Leffler, Apr 15 '10 at 08:12
113383 requires 247 steps and the biggest number it generates is 2482111348. That does not fit into a signed 32-bit integer (so you get negative numbers and who knows what other problems). — Jonathan Leffler, Apr 15 '10 at 08:18
The biggest number generated starting with values up to 1 million is just under 57 billion. The largest number of steps is 524; it is not associated with the biggest number, though. I know which is the starting number - but you should be able to find out too. (Using `bc`, it took under 10 minutes to generate the list of values 'start = X, max = Y, count = Z', ensuring there was no superfluous output. Raw C should make mincemeat of that timing. But `bc` will do super-ridiculously large numbers without blinking; writing the C code to handle bigger than 64-bit integers is not trivial.) — Jonathan Leffler, Apr 15 '10 at 09:05
There's a [winning entry in the 2015 IOCCC](http://www.ioccc.org/years.html#2015) that computes Collatz sequences with arbitrary large values (aka bignums). Look for schweikhardt. What a shameless plug :-) — Jens, Sep 14 '16 at 06:55

score 14 · Accepted Answer · edited Dec 27 '12 at 15:56

14

The reason you're stalling is because you pass through a number greater than 2^31-1 (aka INT_MAX); try using unsigned long long instead of int.

I recently blogged about this; note that in C the naive iterative method is more than fast enough. For dynamic languages you may need to optimize by memoizing in order to obey the one minute rule (but this is not the case here).

Oops I did it again (this time examining further possible optimizations using C++).

edited Dec 27 '12 at 15:56

Lightness Races in Orbit

378,754
76
643
1,055

answered Apr 15 '10 at 07:54

Motti

110,860
49
189
262

I can't tell if @Motti is being serious. – Larry Apr 15 '10 at 13:58
@Larry, I was, apologies, English isn't my native language and I seem to have being misreading memoization (Chrome's spell checker didn't help). – Motti Apr 15 '10 at 14:33
Heh, I wasn't quite sure, wasn't trying to be snarky, doubly because it was "dynamic programming" but you had some other acronym for "DP". =) – Larry Apr 15 '10 at 15:28

score 9 · Answer 2 · answered Apr 15 '10 at 06:58

9

Notice that your brute force solution often computes the same subproblems over and over again. For example, if you start with 10, you get 5 16 8 4 2 1; but if you start with 20, you get 20 10 5 16 8 4 2 1. If you cache the value at 10 once it's computed, and then won't have to compute it all over again.

(This is known as dynamic programming.)

answered Apr 15 '10 at 06:58

Jesse Beder

33,081
21
109
146

I understand the concept but am not very sure how it translates into code. Perhaps a lookup table? – paradox Apr 15 '10 at 07:06
@Nick I disagree that "the proper term is". See the wiki article. Memoization is just *one* approach that can be utilized in DP. – Apr 15 '10 at 07:39
1

This is not the reason he's hanging, the reason is that `int` isn't big enough (see my answer with less votes below). – Motti Apr 15 '10 at 07:56
@pst, DP makes use of recursion and memoization techniques for finding optimal solutions. – Nick Dandoulakis Apr 15 '10 at 07:56
@Nick, I was taking a broader approach in my terminology. Yes, caching is known as memoization, but it all falls under the umbrella of dynamic programming, which is what he should study. (Of course, I completely missed the 32-bit int business, which was the actual problem.) This also raises the question of whether you should post an answer and then immediately go to sleep :) – Jesse Beder Apr 15 '10 at 16:57

score 1 · Answer 3 · edited Apr 15 '10 at 08:17

1

Having just tested it in C#, it appears that 113383 is the first value where the 32-bit int type becomes too small to store every step in the chain.

Try using an unsigned long when handling those big numbers ;)

edited Apr 15 '10 at 08:17

Jonathan Leffler

730,956
141
904
1,278

answered Apr 15 '10 at 07:49

Thibault Falise

5,795
2
29
32

True, but note that in C `long` may very well be the same as `int` and `long long` should be used. – Motti Apr 15 '10 at 08:01
@Motti: uintmax_t seems a reasonable choice. – Jonathan Leffler Apr 15 '10 at 08:17
@Jonathan, I wasn't familiar with `uintmax_t`. I see it's part of C99 which must be the reason for this, I use C++ which doesn't define it. – Motti Apr 15 '10 at 09:29
1

@Motti: fair enough - but the C++ standard doesn't define `long long` either :) – Jonathan Leffler Apr 15 '10 at 14:50

score 1 · Answer 4 · answered Apr 15 '10 at 07:55

1

I solved the problem some time ago and luckily still have my code. Do not read the code if you don't want a spoiler:

#include <stdio.h>

int lookup[1000000] = { 0 };

unsigned int NextNumber(unsigned int value) {
  if ((value % 2) == 0) value >>= 1;
  else value = (value * 3) + 1;
  return value;
}

int main() {
  int i = 0;
  int chainlength = 0;
  int longest = 0;
  int longestchain = 0;
  unsigned int value = 0;
  for (i = 1; i < 1000000; ++i) {
    chainlength = 0;
    value = i;
    while (value != 1) {
      ++chainlength;
      value = NextNumber(value);
      if (value >= 1000000) continue;
      if (lookup[value] != 0) {
        chainlength += lookup[value];
        break;
      }
    }

    lookup[i] = chainlength;

    if (longestchain < chainlength) {
      longest = i;
      longestchain = chainlength;
    }
  }
  printf("\n%d: %d\n", longest, longestchain);
}

time ./a.out

[don't be lazy, run it yourself]: [same here]

real    0m0.106s
user    0m0.094s
sys     0m0.012s

answered Apr 15 '10 at 07:55

value >>= 1 Is this shift left one bit and assign? – paradox Apr 15 '10 at 08:08
>>= 1 is divide by 2. It would be totally sufficient to just write /= 2 as the compiler would optimize himself, it's simple premature optimization on my part. – Apr 15 '10 at 08:12
I noticed a error in your code. Chainlength should start at 1 instead of 0. – paradox Apr 15 '10 at 08:20
Thanks for pointing it out. Anyways, as the question asks for the highest number, not the most steps the result will still be the same, just the number of steps might be off by one (and real programmers begin counting with 0 anyways ;-) ). – Apr 15 '10 at 08:27
I tried a similar program but originally used a hash instead of an array. And well.... it. Didn't work. Just kept running. – Tim Potapov Oct 09 '15 at 19:13

score 0 · Answer 5 · answered Apr 24 '14 at 06:58

My effort in C#, run time < 1 second using LinqPad:

var cache = new Dictionary<long, long>();
long highestcount = 0;
long highestvalue = 0;
for (long a = 1; a < 1000000; a++)
{
    long count = 0;
    long i = a;
    while (i != 1)
    {
        long cachedCount = 0;
        if (cache.TryGetValue(i, out cachedCount)) //See if current value has already had number of steps counted & stored in cache
        {
            count += cachedCount; //Current value found, return cached count for this value plus number of steps counted in current loop
            break;
        }
        if (i % 2 == 0)
            i = i / 2;
        else
            i = (3 * i) + 1;
        count++;
    }
    cache.Add(a, count); //Store number of steps counted for current value
    if (count > highestcount)
    {
        highestvalue = a;
        highestcount = count;
    }
}
Console.WriteLine("Starting number:" + highestvalue.ToString() + ", terms:" + highestcount.ToString());

score 0 · Answer 6 · answered Apr 25 '14 at 07:04

Fixing the unsigned int issue in the original question.

Added array for storing pre-computed values.

include <stdio.h>                                                                                                                                     
#define LIMIT 1000000

unsigned int dp_array[LIMIT+1];

unsigned int iteration(unsigned int value)
{
 if(value%2==0)
  return (value/2);
 else
  return (3*value+1);
}

unsigned int count_iterations(unsigned int value)
{
 int count=1;
 while(value!=1)
 {
 if ((value<=LIMIT) && (dp_array[value]!=0)){
   count+= (dp_array[value] -1);
   break;
  } else {
   value=iteration(value);
   count++;
  }
 }
 return count;
}

int main()
{
 int iteration_count=0, max=0;
 int i,count;
 for(i=0;i<=LIMIT;i++){
  dp_array[i]=0;
 }

 for (i=1; i<LIMIT; i++)
 {
//  printf("Current iteration : %d \t", i);
  iteration_count=count_iterations(i);
  dp_array[i]=iteration_count;
//  printf(" %d \t", iteration_count);
  if (iteration_count>max)
   {
   max=iteration_count;
   count=i;
   }
//  printf(" %d \n", max);

 }

 printf("Count = %d\ni = %d\n",max,count);

}

o/p: Count = 525 i = 837799

score 0 · Answer 7 · answered Apr 15 '10 at 07:35

As has been said, the simplest way is to get some memoization to avoid recomputing things that haven't been computed. You might be interested in knowing that there is no cycle if you being from a number under one million (no cycle has been discovered yet, and people have explored much bigger numbers).

To translate it in code, you can think the python way:

MEMOIZER = dict()

def memo(x, func):
  global MEMOIZER
  if x in MEMOIZER: return MEMOIZER[x]

  r = func(x)

  MEMOIZER[x] = r

  return r

Memoization is a very generic scheme.

For the Collatze conjecture, you might run in a bit of a pinch because numbers can really grow and therefore you might blow up the available memory.

This is traditionally handled using caching, you only cache the last n results (tailored to occupy a given amount of memory) and when you already have n items cached and wish to add a newer one, you discard the older one.

For this conjecture there might be another strategy available, though a bit harder to implement. The basic idea is that you have only ways to reach a given number x:

from 2*x
from (x-1)/3

Therefore if you cache the results of 2*x and (x-1)/3 there is no point in caching x any longer >> it'll never get called anymore (except if you wish to print the sequence at the end... but it's only once). I leave it to you to take advantage of this so that your cache does not grow too much :)

tphyahoo · Answer 8 · 2012-11-06T16:32:33.043

Haskell solution, 2 second run time.

thomashartman@yucca:~/collatz>ghc -O3 -fforce-recomp --make collatz.hs
[1 of 1] Compiling Main             ( collatz.hs, collatz.o )
Linking collatz ...
thomashartman@yucca:~/collatz>time ./collatz
SPOILER REDACTED
real    0m2.881s

-- Maybe I could have gotten it a bit faster using a hash instead of a map.

import qualified Data.Map as M
import Control.Monad.State.Strict
import Data.List (maximumBy)
import Data.Function (on)

nextCollatz :: Integer -> Integer
nextCollatz n | even n = n `div` 2
               | otherwise = 3 * n + 1

newtype CollatzLength = CollatzLength Integer
  deriving (Read,Show,Eq,Ord)

main = print longestCollatzSequenceUnderAMill 
longestCollatzSequenceUnderAMill = longestCollatzLength [1..1000000]
-- sanity checks
tCollatzLengthNaive = CollatzLength 10 == collatzLengthNaive 13 
tCollatzLengthMemoized = (CollatzLength 10) == evalState (collatzLengthMemoized 13) M.empty

-- theoretically could be nonterminating. Since we're not in Agda, we'll not worry about it.
collatzLengthNaive :: Integer -> CollatzLength
collatzLengthNaive 1 = CollatzLength 1
collatzLengthNaive n = let CollatzLength nextLength = collatzLengthNaive (nextCollatz n)
                       in  CollatzLength $ 1 + nextLength

-- maybe it would be better to use hash here?
type CollatzLengthDb = M.Map Integer CollatzLength
type CollatzLengthState = State CollatzLengthDb 

-- handy for testing
cLM :: Integer -> CollatzLength
cLM n = flip evalState M.empty $ (collatzLengthMemoized n) 

collatzLengthMemoized :: Integer -> CollatzLengthState CollatzLength
collatzLengthMemoized 1 = return $ CollatzLength 1
collatzLengthMemoized n = do
  lengthsdb <- get
  case M.lookup n lengthsdb of 
    Nothing -> do let n' = nextCollatz n 
                  CollatzLength lengthN' <- collatzLengthMemoized n'
                  put $ M.insert n' (CollatzLength lengthN') lengthsdb
                  return $ CollatzLength $ lengthN' + 1
    Just lengthN -> return lengthN

longestCollatzLength :: [Integer] -> (Integer,CollatzLength)
longestCollatzLength xs = flip evalState M.empty $ do 
  foldM f (1,CollatzLength 1) xs
  where f maxSoFar@(maxN,lengthMaxN) nextN = do
          lengthNextN <- collatzLengthMemoized nextN
          let newMaxCandidate = (nextN,lengthNextN)
          return $ maximumBy (compare `on` snd) [maxSoFar, newMaxCandidate]

================================================================================

And here is another haskell solution, using monad-memo package. Unfortunately, this one has a stack space error that does not affect the rolled-my-own memoizer above.

./collatzMemo +RTS -K83886080 -RTS # this produces the answer, but it would be bettter to eliminate the space leak

{-# Language GADTs, TypeOperators #-} 

import Control.Monad.Memo
import Data.List (maximumBy)
import Data.Function (on)

nextCollatz :: Integer -> Integer
nextCollatz n | even n = n `div` 2
               | otherwise = 3 * n + 1

newtype CollatzLength = CollatzLength Integer
  deriving (Read,Show,Eq,Ord)

main = print longestCollatzSequenceUnderAMill 
longestCollatzSequenceUnderAMill = longestCollatzLength [1..1000000]

collatzLengthMemoized :: Integer -> Memo Integer CollatzLength CollatzLength
collatzLengthMemoized 1 = return $ CollatzLength 1
collatzLengthMemoized n = do
  CollatzLength nextLength <- memo collatzLengthMemoized (nextCollatz n)
  return $ CollatzLength $ 1 + nextLength 
{- Stack space error
./collatzMemo
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

Stack error does not effect rolled-my-own memoizer at
http://stackoverflow.com/questions/2643260/project-euler-question-14-collatz-problem
-}
longestCollatzLength :: [Integer] -> (Integer,CollatzLength)
longestCollatzLength xs = startEvalMemo $ do
  foldM f (1,CollatzLength 1) xs
  where f maxSoFar nextN = do
          lengthNextN <- collatzLengthMemoized nextN
          let newMaxCandidate = (nextN,lengthNextN)
          return $ maximumBy (compare `on` snd) [maxSoFar, newMaxCandidate]

{-
-- sanity checks
tCollatzLengthNaive = CollatzLength 10 == collatzLengthNaive 13 
tCollatzLengthMemoized = (CollatzLength 10) ==startEvalMemo (collatzLengthMemoized 13) 

-- theoretically could be nonterminating. Since we're not in Agda, we'll not worry about it.
collatzLengthNaive :: Integer -> CollatzLength
collatzLengthNaive 1 = CollatzLength 1
collatzLengthNaive n = let CollatzLength nextLength = collatzLengthNaive (nextCollatz n)
                       in  CollatzLength $ 1 + nextLength
-}

==================================================

another one, factored more nicely. doesn't run as fast but still well under a minute

import qualified Data.Map as M
import Control.Monad.State
import Data.List (maximumBy, nubBy)
import Data.Function (on)

nextCollatz :: Integer -> Integer
nextCollatz n | even n = n `div` 2
               | otherwise = 3 * n + 1

newtype CollatzLength = CollatzLength Integer
  deriving (Read,Show,Eq,Ord)

main = print longestCollatzSequenceUnderAMillStreamy -- AllAtOnce                                                                                                                                                                                                         

collatzes = evalState collatzesM M.empty
longestCollatzSequenceUnderAMillAllAtOnce = winners . takeWhile ((<=1000000) .fst) $ collatzes
longestCollatzSequenceUnderAMillStreamy = takeWhile ((<=1000000) .fst) . winners  $ collatzes


-- sanity checks                                                                                                                                                                                                                                                          
tCollatzLengthNaive = CollatzLength 10 == collatzLengthNaive 13
tCollatzLengthMemoized = (CollatzLength 10) == evalState (collatzLengthMemoized 13) M.empty

-- maybe it would be better to use hash here?                                                                                                                                                                                                                             
type CollatzLengthDb = M.Map Integer CollatzLength
type CollatzLengthState = State CollatzLengthDb

collatzLengthMemoized :: Integer -> CollatzLengthState CollatzLength
collatzLengthMemoized 1 = return $ CollatzLength 1
collatzLengthMemoized n = do
  lengthsdb <- get
  case M.lookup n lengthsdb of
    Nothing -> do let n' = nextCollatz n
                  CollatzLength lengthN' <- collatzLengthMemoized n'
                  put $ M.insert n' (CollatzLength lengthN') lengthsdb
                  return $ CollatzLength $ lengthN' + 1
    Just lengthN -> return lengthN

collatzesM :: CollatzLengthState [(Integer,CollatzLength)]
collatzesM = mapM (\x -> do (CollatzLength l) <- collatzLengthMemoized x
                            return (x,(CollatzLength l)) ) [1..]

winners :: Ord b => [(a, b)] -> [(a, b)]
winners xs = (nubBy ( (==) `on` snd )) $ scanl1 (maxBy snd) xs

maxBy :: Ord b => (a -> b) -> a -> a -> a
maxBy f x y = if f x > f y then x else y

Project Euler Question 14 (Collatz Problem)

8 Answers8

another one, factored more nicely. doesn't run as fast but still well under a minute

Linked

Related