3

I made a program in Java to calculate extreme factorials like that of 1 milion. What it does essentially is to start a loop from 1 to n and every iteration, multiply a BigDecimal with the value of the counter variable in the loop. Once the loop is done, it invokes BigDecimal#toPlainString() which returns the number produced as a String. The invokation of this method however takes very very long to execute. For example, in the code bellow:

public static void main(String[] args) {
        BigDecimal number = new BigDecimal(1);
        long startTime = System.currentTimeMillis();
        for (int i = 1; i < 500000; i++) {
            number = number.multiply(new BigDecimal(i));
        }
        System.out.println("Generating took: " + (System.currentTimeMillis() - startTime) + "ms. Creating String.");
        startTime = System.currentTimeMillis();
        String result = number.toPlainString();
        System.out.println("String generation took: " + (System.currentTimeMillis() - startTime) + "ms");
        FileUtils.writeStringToFile(new File("Path/To/File"), result);
    }

the output to console was:

Generating took: 181324ms. Creating String.
String generation took: 710498ms

which demonstrates how long the method toPlainString() took.

I understand that we are dealing with huge numbers (arround a milion digits in the example above) but i wanted to know if there is any way to speed up this method and how should i go about doing that?

Thanks!

EDIT #1: The only reason why the milisecond time calculations where added in the post is to bring 'long' into prespective, and possibly demonstrate the behaviour of the code in case the problem is not reproducable by all readers. What i am trying to do is determine why it took so long in my case, and most importantly how to speed up the process of converting to String.

Marco13
  • 53,703
  • 9
  • 80
  • 159
fill͡pant͡
  • 1,147
  • 2
  • 12
  • 24
  • Possible duplicate of [How do I write a correct micro-benchmark in Java?](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Erwin Bolwidt Jan 20 '17 at 01:21
  • 4
    This question is *in no way* related to microbenchmarking. When it takes long, then it takes long. JMH won't yield a faster string generation.... – Marco13 Jan 20 '17 at 01:26
  • @Marco13 It has *everything* to do with microbenchmarking. What do you mean with "when **it** takes long". "It" being - Hotspot compilation (method inlining)? Or garbage collection? And what is long? How do you know how much one piece of code gets optimized versus another? Is the first part already compiled to native code because it ran 500000 times and the second part not because it only ran one time? – Erwin Bolwidt Jan 20 '17 at 01:40
  • 2
    @ErwinBolwidt This is no mircobenchmark. It's a program that serves a certain purpose. Namely, computing this number and writing it to a file. And this takes long. And the asker wants to know how this can be made faster. **If** your hint should suggest that he should measure more exactly *what* is taking long, then I agree: I ran the program and had `160727ms` for the computation, but only `5663ms` for the string creation (I didn't write it to a file, though). So why does it take long for fillpant? Good question. Start it with `Xmx1000m` and `verbose:gc` to rule out memory issues, as a start. – Marco13 Jan 20 '17 at 01:47
  • @Marco13 I'll do that tomorrow and ill come back to you. For me in both machines i've tried (Windows, 10Gb RAM dedicated to the VM and a 4.6GHz AMD FX 8350 available and Ubuntu 2GB dedicated to VM, 1.6 GHz Intel i3) String generation takes almost 4 times the time taken for the creation of the BigDecimal. – fill͡pant͡ Jan 20 '17 at 01:55
  • @ErwinBolwidt Apologies if there is a confusion with my post. My purpose is not to measure time taken accurately. The only reason why i put it there is to bring 'long' into prespective and try to narrow down the behaviour as much as i can. – fill͡pant͡ Jan 20 '17 at 01:55
  • @fill͡pant͡ What version of Java are you running? – Erwin Bolwidt Jan 20 '17 at 02:01
  • @ErwinBolwidt Java 7 – fill͡pant͡ Jan 20 '17 at 12:14
  • @Marco13 Hey! I run the VM with Xmx1000m and saved the gc verbose to a log file which you can find here: http://pastebin.com/piduBTnn Not sure what that indicates really :S its an area i haven't touched on yet :( Thanks :- ) – fill͡pant͡ Jan 20 '17 at 13:24
  • The output does not include the `println` lines, so it's hard to determine where the string creation started. But in any case, the total GC time seems to be ~7 seconds, so this should not be the issue. I'd suggest to run the program in a profiler, for example, in jVisualVM ... – Marco13 Jan 20 '17 at 16:15
  • (BTW: I tested this with Java8 - I'd have to install a version 7 VM to see whether it makes a difference) – Marco13 Jan 20 '17 at 16:16
  • @Marco13 Spot on sir! Thats the issue! 1.7 took an iternity to generate the String, however 1.8 did it in just under 6 seconds! :O Thanks a lot! Would you please post it as an answer so i can select it as 'accepted' for future reference? :- ) Lastly, a question, will i be able to multithread the multiplication? i thought of spliting into threads each handling 1/5 of the number (n/5) and then when they are done multiply their results togeather. Any other way? Cheers! – fill͡pant͡ Jan 20 '17 at 22:36
  • Just stating that this is an issue of 1.7 would not be worth an answer. But now, I also have the chance to do my own tests with 1.7 and 1.8, try to figure out the differences, and (if I consider it worth it) post my insights as an answer. – Marco13 Jan 21 '17 at 01:40
  • @Marco13 Alright! This comment answered my question therefore i think would be an appropriate answer. If you include timing with 1.7 and 1.8 for the same piece of code where the time difference is stagering, then i bet it will be fine! Thanks again for your answer :- ) – fill͡pant͡ Jan 21 '17 at 01:41
  • A last side note: If you want to compute a factorial, then you may use `BigInteger` instead of `BigDecimal`. It will not make such a large difference (e.g. regarding the performance), but *conceptually*, there is no reason to use `BigDecimal` when you know that you are solely handling integral values. – Marco13 Jan 21 '17 at 15:14
  • @Marco13 I see, i might do that, i thought that `BigInteger#toString()` returned the exponential form of the number which wouldn't be very usefull but it apears that i was wrong! :o – fill͡pant͡ Jan 21 '17 at 15:26
  • Java 7's BigInteger (which is used inside large BigDecimals) uses the rather naive implementation: divide by 10, store the remainder as digit and repeat until 0. This means it repeatedly divides a huge number by 10. Java 8 uses a divide and conquer algorithm which divides the huge number into smaller numbers, recursively, and only does the naive approach if the numbers get very small. Then it puts the digits together. This is *much* faster, indeed. – Rudy Velthuis Jan 24 '17 at 08:22
  • @filipant: BigDecimal.toString() often returns an exponential form (rules in the docs). That is why .toPlainString() exists. BigInteger.toString() always returns the "plain" string. – Rudy Velthuis Jan 24 '17 at 08:28

2 Answers2

4

The reason why BigDecimal#PlainString takes very long to generate the string with Java 7 is: It was implemented very inefficiently in Java 7. Fortunately, it is much faster in Java 8.

Here, it may be important to note that in this particular case, it is not really the string creation in BigDecimal, but that in BigInteger. The value that is computed in the given example is a large factorial, and thus, effectively an integral value. The internal scale field of the BigDecimal will be 0 then, and having a look at the toPlainString method shows that in this case, the string value of the internal intVal field will be returned:

public String toPlainString() {
    if(scale==0) {
        if(intCompact!=INFLATED) {
            return Long.toString(intCompact);
        } else {
            return intVal.toString();
        }
    }
    ...
}

This intVal field is a BigInteger, and this is the actual culprit here.

The following program is not intended as a proper "microbenchmark", but only supposed to give an estimate of the performance: It creates several factorials, and generates the string representations of these:

import java.math.BigDecimal;

public class BigDecimalToPlainStringPerformance
{
    public static void main(String[] args)
    {
        for (int n = 10000; n <= 50000; n += 5000)
        {
            BigDecimal number = factorial(n);
            long before = System.nanoTime();
            String result = number.toPlainString();
            long after = System.nanoTime();

            double ms = (after - before) / 1e6;
            System.out.println(n + "! took " + ms +
                " ms, length " + result.length());
        }
    }

    private static BigDecimal factorial(int n)
    {
        BigDecimal number = new BigDecimal(1);
        for (int i = 1; i < n; i++)
        {
            number = number.multiply(new BigDecimal(i));
        }
        return number;
    }

}

With Java 7 (u07), on my (old) PC, the output is along the lines of

10000! took 514.98249 ms, length 35656
15000! took 1232.86507 ms, length 56126
20000! took 2364.799995 ms, length 77333
25000! took 3877.565724 ms, length 99090
30000! took 5814.925361 ms, length 121283
35000! took 8231.13608 ms, length 143841
40000! took 11088.823021 ms, length 166709
45000! took 14344.778177 ms, length 189850
50000! took 18155.089823 ms, length 213232

Fortunately, this performance problem has been fixed in Java 8. With Java 8 (u45), the output is

10000! took 77.20227 ms, length 35656
15000! took 113.811951 ms, length 56126
20000! took 188.293764 ms, length 77333
25000! took 261.328745 ms, length 99090
30000! took 355.001264 ms, length 121283
35000! took 481.912925 ms, length 143841
40000! took 610.812827 ms, length 166709
45000! took 698.80725 ms, length 189850
50000! took 840.87391 ms, length 213232

showing that the performance has been improved significantly.

From quickly skimming over the commit logs in the OpenJDK, there is one commit that is probably the most relevant here:

Accelerate conversion to string by means of Schoenhage recursive base conversion

(I did not verify this, but it seems the only one which dedicatedly aimed at improving the toString performance)

Marco13
  • 53,703
  • 9
  • 80
  • 159
  • Thats a great explanation! Thanks! :- ) – fill͡pant͡ Jan 21 '17 at 15:29
  • FWIW, the Schoenhage algorithm is only mentioned in the Java sources. I searched the web, but all I could find is about Schoenhage-Strassen, which is not the same. I could not find any other mention of a Schoenhage or Schönhage algorithm for base conversion. I had developed a very similar algorithm for my own BigInteger all on my own (for Delphi, not for Java), only to find that almost the same algorithm is used by Java and was, somehow, developed by Schoenhage. – Rudy Velthuis Jan 26 '17 at 11:31
2

First off, your benchmark is not reproducible. In your case, the factorial part took 181324ms and the string generation took 710498ms, so string generation was 710498 / 181324 = 3.9 times as slow as the factorial part.

When I run it one time just like you wrote it, it gives these results.

Generating took: 90664ms. Creating String.
String generation took: 3465ms

So the string generation is 90644 / 3465 = 26 times as fast as the factorial part.

When you run a benchmark, you need to run it many times to take the average. Especially when you do a long-running microbenchmark like yours, because so many other things may be going on in your computer at the same time - perhaps your virus checker kicked in, or your Java process got swapped out to disk due to low memory, or Java decided to do a full garbage collection.

Secondly, you're not warming up the VM, so it's unclear what you are benchmarking. Are you benchmarking the HotSpot compilation engine's native compiler or your actual code? That's why you always need to warm up the VM before running microbenchmarks like yours.

The best way to do that is to use a proper microbenchmarking framework. An alternative is to run your code more times (using a for-loop) and when it settles down on timings that don't decrease anymore, you have a good indication that warm up has completed and you can take the average of the next couple of runs to come up with a number.

Running it on my MacBookPro, this resulted in an average of 80144 ms for the factorial part, and 2839 ms for the string generation (note I didn't look into memory usage yet for this).

So the string generation is 80144 / 2839 = 28 times as fast as the factorial part.

If you can reproduce the same result multiple times on your machine, while you are not touching it at all when you run your program, and your machine has enough memory, then there is something interesting going on. But the problem is not in the toPlainString() method in BigDecimal - that method is much faster than the factorial part of your code.

Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79
  • 2
    Sorry for being so insistive here, but: *This is not about microbenchmarking*. When you run some JMH benchmark (whatever it should look like), involving `multiply` and `toPlainString` for numbers of different lengths, then you may figure out how many ms each operation takes. And then, *the actual program is still slow!*. (Again: I agree that the *reason* for *why* it is slow has to be analyzed, but am sure that a microbenchmark is not the way to go here. Rather, a profiler run with jVisualVM or Java Flight Recorder) – Marco13 Jan 20 '17 at 12:09
  • Thanks for the answer, This behaviour did not happend just once though, it has happened many times before with different factorial numbers. Infact, for the factorial of 10,000,000 String generation took 3 hours. In this problem i am not trying to have an accurate measurement of time however, all im trying to do is speed up a process that takes way longer than it should. As for memory i have dedicated 10/16GB to the VM which i think is more than enough, the processor was at 3% throughout hiting peaks of 10%. Tried the same on a lower spec laptop runing ubuntu and the problem was the same! – fill͡pant͡ Jan 20 '17 at 12:13
  • @fill͡pant͡ "the processor was at 3%" - That's it. The core must be at 100% during the whole computation (the system may report it as 25% in case of 4 cores). So you CPU did mostly nothing. Why? I'd bet your computer wasted most of time swapping. – maaartinus Jan 20 '17 at 13:54
  • @maaartinus Hmm interesting, its an 8 core so it makes sense. Well the same issue however occured in the lower spec ubuntu laptop :S How can that be explained? Thanks! – fill͡pant͡ Jan 20 '17 at 14:13