3

Answers So Far

So here is the code breakdown.

//Time: ~7s (linear loop algorithm)
//100,000! (456,574 decimal digits)
BigInteger bigIntVar = computeFactorial(100000);

//The first three here are just for comparison and are not actually Base 10.
bigIntVar.ToBase64String() //Time: 00.001s | Base 64 | Tetrasexagesimal
bigIntVar.ToString("x")    //Time: 00.016s | Base 16 | Hexadecimal
bigIntVar.ToBinaryString() //Time: 00.026s | Base 02 | Binary
bigIntVar.ToQuickString()  //Time: 11.200s | Base 10 | String Version
bigIntVar.ToQuickString()  //Time: 12.500s | Base 10 | StringBuilder Version
bigIntVar.ToString()       //Time: 13.300s | Base 10 | Original

Original Question Stuff

I have spent way to much time on this, so I need your help.

This is for a personal project to compute ginormous factorials (ex. 100,000!)

Here is my code:

using (var stream = new StreamWriter(fileName + ".txt", false))
{
    stream.WriteLine(header);

    var timer = new Stopwatch();    
    timer.Restart();
    //This is the huge BigInteger holding the answer to 100,000!
    stream.WriteLine(saveFactorial.Output.ToString());         
    //Let me be clear: ToString() is directly causing the the 13sec time delay.
    //Not the stream.
    timer.Stop();                   
}

time = (timer.ElapsedMilliseconds / 1000.0).ToString() + "s"; 

MessageBox.Show(time);

On 100,000! is takes about 7sec on my machine to compute (linear loop algorithm).

Yet with this standard IO code it takes 13sec to save.

So in other words, it takes longer to save the work than it does to modestly compute it.

So I thought maybe I could use:

BigInteger.ToByteArray();

Although this runs extremely fast, I couldn't figure out how to save it to readable text.

You can use the above method to write the binary string to a text file with this self-made extension:

ToBinaryString

//Usage: string bigIntBinary = bigIntVar.ToBinaryString();
public static string ToBinaryString(this BigInteger source)
{
    //If you lookup the ToByteArray() method...
    //It actually stores the bytes in reverse order.
    var bigIntBytes = source.ToByteArray().Reverse();

    StringBuilder bigIntBinary = new StringBuilder();

    foreach (var bigIntByte in bigIntBytes)
    {
       bigIntBinary.Append(Convert.ToString(bigIntByte, 2).PadLeft(8, '0'));
    }

    return bigIntBinary.ToString();
}

ToBase64String

    ////Usage: string bigIntBase64 = bigIntVar.ToBase64String();
    public static string ToBase64String(this BigInteger source)
    {
        var bigIntBytes = source.ToByteArray().Reverse().ToArray();

        return Convert.ToBase64String(bigIntBytes);
    }

I also tried the math way (mod 10, etc...) to get each digit, but that takes a TON more time that ToString().

What am I doing wrong here?


This code is what I came up with based on the answer below. This is faster than ToString(), but only by a couple seconds.

ToQuickString

//Usage: string bigIntString = bigIntVar.ToQuickString()
public static String ToQuickString(this BigInteger source)
{
    powersOfTen = new List<BigInteger>();

    powersOfTen.Add(1);

    for (BigInteger i = 10; i < source; i *= i)
    {
        powersOfTen.Add(i);
    }

    return BuildString(source, powersOfTen.Count - 1).ToString().TrimStart('0');
}

private static List<BigInteger> powersOfTen;

private static string BuildString(BigInteger n, int m)
{
    if (m == 0)
        return n.ToString();

    BigInteger remainder;
    BigInteger quotient = BigInteger.DivRem(n, powersOfTen[m], out remainder);

    return BuildString(quotient, m - 1) + BuildString(remainder, m - 1);
}
user1787963
  • 13
  • 1
  • 5
  • It seems that it's the call to `ToString` that takes quite a while, but if you want the file to be human-readable there's not much you can do about that. – Rawling Oct 31 '12 at 09:06
  • Yea I edited to make that clear. I debugged it for quite some time and found that ToString() and nothing else was causing the time delay. Is there no quicker way to turn a BigInteger into a string? – user1787963 Oct 31 '12 at 09:07
  • 3
    I doubt it. Do you really need these to be human-readable? Nobody's going to sit down and read a 450k-character number, are they? – Rawling Oct 31 '12 at 09:10
  • Well I have a 1GB text file with 1billion digits of Pi... This was more of a personal thing. I just like knowing that I have a 450k digit number that means something on my computer. – user1787963 Oct 31 '12 at 09:19
  • Hah fair enough :) If it would mean the same to you if the file is in readable _hexadecimal_, calling `.ToString("x")` is _much_ quicker. – Rawling Oct 31 '12 at 09:21
  • 2 -> 4 -> 8 -> 6 -> 2.. binary never hits 0 in decimal, ie no way to optimize it into decimal notation. Save it in binary (or hex (`.ToString("X")`)) :) Cause no one is really going to read it right? – flindeberg Oct 31 '12 at 09:22
  • In your ToBinaryString, remove the .ToList() after Reverse(). ToList() creates a copy of the data. You don't need a copy of the data, the ForEach can just operate directly on the IEnumerable<> returned by Reverse(). – dthorpe Nov 02 '12 at 22:07

2 Answers2

2

First I'd calculate all numbers of the form 10^(2^m) smaller than n. Then I'd use DivRem with the largest of these to split the problem into two subproblems. Repeat that recursively until you're down to individual digits.

var powersOfTen=new List<BigInteger>();
powersOfTen.Add(1);
for(BigInteger i=10;i<n;i=i*i)
  powersOfTen.Add(i);

string ToString(BigInteger n, int m)
{
  if(m==0)
    return n.ToString();
  quotient = DivRem(n,powersOfTen[m], remainder)
  return ToString(quotient, m-1)+ToString(remainder, m-1)
}

You can also optimize out the string concatenation entirely by directly writing into a character array.


Alternatively you could consider using base 1000'000'000 during all the calculations. That way you don't need the base conversion in the end at all. That's probably much faster for factorial calculation.

List<int> multiply(List<int> f1, int f2)
{
  int carry=0;
  for(int i=0;i<f1.Count;i++)
  {
    var product=(Int64)f1[i]*(Int64)f2;
    carry=product/1000000000;
    result.Add(product%1000000000);
  }
  if(carry!=0)
    result.Add(carry);
}

Now conversion to a base 10 string is trivial and cheap.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • Gone to bookmark this and try this out soon. Need some bed time right now though. I really like this idea. Although recursion will probably lead to a StackOverflow due to the 450k individual digits. – user1787963 Oct 31 '12 at 09:51
  • 2
    No it will not cause a stack overflow. The recursion depth is logarithmic in the number of digits. – CodesInChaos Oct 31 '12 at 09:53
  • Just tested this code. Let me know if I implemented this wrong though. I don't think I did. I didn't have time to make it "proper" code though. It was just a quick put copy of you did. – user1787963 Nov 02 '12 at 17:19
  • I just wanted to add that currently it is placing all the trailing zeros at both ends of the number. I may have done something wrong with the code lol. – user1787963 Nov 02 '12 at 17:27
1

Save the BigInteger data in binary or hex format. It is readable to the computer, and to sufficiently dedicated humans. ;>

Spending extra effort to make the output "human readable" is a waste of time. No human is going to be able to make sense out of 450,000 digits regardless of whether they are base 10, base 16, base 2, or anything else.

UPDATE

Looking into the Base 10 conversion a little more closely, it is possible to cut the baseline performance of ToString almost in half using multiple threads on a multi core system. The main obstacle is that the largest consumer of time across the entire decimalization process is the first division operation on the original 450k digit number.

Stats on my quad core P7: 
Generating a 500k digit random number using power and multiply: 5 seconds
Dividing that big number by anything just once: 11 seconds
ToString(): 22 seconds
ToQuickString: 18 seconds
ToStringMT: 12.9 seconds

.

public static class BigIntExtensions
{
    private static List<BigInteger> powersOfTen;

    // Must be called before ToStringMt()
    public static void InitPowersOfTen(BigInteger n)
    {
        powersOfTen = new List<BigInteger>();

        powersOfTen.Add(1);

        for (BigInteger i = 10; i < n; i *= i)
            powersOfTen.Add(i);
    }

    public static string ToStringMT(this BigInteger n)
    {
        // compute the index into the powersOfTen table for the given parameter. This is very fast.
        var m = (int)Math.Ceiling(Math.Log(BigInteger.Log10(n), 2));

        BigInteger r1;
        // the largest amount of execution time happens right here:
        BigInteger q1 = BigInteger.DivRem(n, BigIntExtensions.powersOfTen[m], out r1);

        // split the remaining work across 4 threads - 3 new threads plus the current thread
        var t1 = Task.Factory.StartNew<string>(() =>
        {
            BigInteger r1r2;
            BigInteger r1q2 = BigInteger.DivRem(r1, BigIntExtensions.powersOfTen[m - 1], out r1r2);
            var t2 = Task.Factory.StartNew<string>(() => BuildString(r1r2, m - 2));
            return BuildString(r1q2, m - 2) + t2.Result;
        });
        BigInteger q1r2;
        BigInteger q1q2 = BigInteger.DivRem(q1, BigIntExtensions.powersOfTen[m - 1], out q1r2);
        var t3 = Task.Factory.StartNew<string>(() => BuildString(q1r2, m - 2));
        var sb = new StringBuilder();
        sb.Append(BuildString(q1q2, m - 2));
        sb.Append(t3.Result);
        sb.Append(t1.Result);
        return sb.ToString();
    }

    // same as ToQuickString, but bails out before m == 0 to reduce call overhead.
    // BigInteger.ToString() is faster than DivRem for smallish numbers.
    private static string BuildString(BigInteger n, int m)
    {
        if (m <= 8)
            return n.ToString();

        BigInteger remainder;
        BigInteger quotient = BigInteger.DivRem(n, powersOfTen[m], out remainder);
        return BuildString(quotient, m - 1) + BuildString(remainder, m - 1);
    }
}

For ToQuickString() and ToStringMT(), the powers of 10 array needs to be initialized prior to using these functions. Initializing this array shouldn't be included in function execution time measurements because the array can be reused across subsequent calls, so its initialization cost is amortized over the lifetime of the program, not individual function calls.

For a production system I would set up a more automatic initialization, such as initializing a reasonable number of entries in the class static constructor and then checking in ToQuickString() or ToStringMT() to see if there are enough entries in the table to handle the given BigInteger. If not, go add enough entries to the table to handle the current BigInteger, then continue with the operation.

This ToStringMT function constructs the worker tasks manually to spread the remaining work out across 4 threads on the available execution cores in a multi core CPU. You could instead just make the original ToQuickString() function spin off half of its work into another thread on each recursion, but this quickly creates too many tasks and gets bogged down in task scheduling overhead. The recursion drills all the way down to individual decimal digits. I modified the BuildString() function to bail out earlier (m <= 8 instead of m == 0) because BigInteger.ToString() is faster than DivRem for smallish numbers.

90% of ToStringMt()'s execution time is taken up by the first DivRem call. It converges very quickly after that, but the first one is really painful.

dthorpe
  • 35,318
  • 5
  • 75
  • 119
  • I agree that when saving the BigInteger it is best to use Hex, and Binary. However I was looking for a faster way to save it in Base10. I realize that a 450k digit number is not going to be fully read, but non-the-less I was wishing to save the file in base 10 (just for the sake of doing so). – user1787963 Nov 02 '12 at 19:23
  • Converting to Base10 requires a lot of work. Conceptually, one long division (mod) for each digit. 450k long divisions will take awhile, especially on a BigInteger data where division is probably not implemented in hardware. If the performance of division by 10^n is less n*(division by 10), you could break the BigInteger up and distribute the decimalization work across multiple threads to take advantage of multiple execution cores. Might cut ToString() time by half on a 4 core CPU. Could work really well on CUDA GPU hardware with hundreds of cores. – dthorpe Nov 02 '12 at 21:50
  • Also, since you're really only interested in fast base 10 conversion of BigInteger, you might want to change the title of the question, or start a new question specifically about BigInteger and Base10 conversion to string. – dthorpe Nov 02 '12 at 21:54