improving bigint write to disk performance

Question

I am working with really large bigint numbers and I need to write them to disk and read them back later because they won't all fit in memory at a time.

The current Chapel implementation first converts the bigint to a string and then writes that string to disk[1]. This is taking a really long time for large integers.

var outputFile = open("outputPath", iomode.cwr);
var writer = outputFile.writer();
writer.write(reallyLargeBigint);
writer.close();
outputFile.close();

Is there any way to use GMP's mpz_out_raw()/mpz_inp_raw()[2] or mpz_export()/mpz_import()[3] or another similar way to dump the bigint's bytes to disk directly without converting to string beforehand and then read the bytes back into a bigint object?

Would that also work for a bigint array?

How could such functionality be added to Chapel's standard library in case it's not possible in the current state?

[1] https://github.com/chapel-lang/chapel/blob/master/modules/standard/BigInteger.chpl#L346

[2] https://gmplib.org/manual/I_002fO-of-Integers.html

[3] https://gmplib.org/manual/Integer-Import-and-Export.html

As you may have noticed, Chapel's `BigInteger` library is just a thin wrapper around `GMP`. You can still use the the `GMP` functions directly, and access the GMP type `mpz_t` via the `.mpz` field, so using `mpz_{in,out}_raw()` and `mpz_{import, export}()` should work. — ben-albrecht, Dec 12 '17 at 20:48
A GitHub issue would be a better place for requesting to add this feature to the `BigInteger` module. If you're able to get the proposed raw or export approaches to work, it is probably worth contributing to the project as a pull request. — ben-albrecht, Dec 12 '17 at 20:53
I'd love to contribute and integrate @David Iten's answer into Chapel directly. I'll look into this and make a pull request if can manage it! — zx228, Dec 13 '17 at 09:12

David Iten · Accepted Answer · 2017-12-13T16:38:06.290

The functions you mentioned aren't directly available in any Chapel modules, but you can write extern procs and extern types to access the GMP functions directly.

First we need to be able to work with C files, so declare some procedures and types for them:

extern type FILE;
extern type FILEptr = c_ptr(FILE);
extern proc fopen(filename: c_string, mode: c_string): FILEptr;
extern proc fclose(fp: FILEptr);

Then we can declare the GMP functions we need:

extern proc mpz_out_raw(stream: FILEptr, const op: mpz_t): size_t;
extern proc mpz_inp_raw(ref rop: mpz_t, stream: FILEptr): size_t;

Now we can use them to write a bigint value:

use BigInteger;
var res: bigint;
res.fac(100); // Compute 100!

writeln("Writing the number: ", res);

var f = fopen("gmp_outfile", "w");
mpz_out_raw(f, res.mpz);
fclose(f);

And read it back in from the file:

var readIt: bigint;

f = fopen("gmp_outfile", "r");
mpz_inp_raw(readIt.mpz, f);
fclose(f);

writeln("Read the number:", readIt);

For arrays of bigint values just loop over them to write or read them:

// initialize the array
var A: [1..10] bigint;
for i in 1..10 do
  A[i].fac(i);

// write the array to a file
f = fopen("gmp_outfile", "w");
for i in 1..10 do
  mpz_out_raw(f, A[i].mpz);
fclose(f);

// read the array back in from the file
var B: [1..10] bigint;
f = fopen("gmp_outfile", "r");
for i in 1..10 do
  mpz_inp_raw(B[i].mpz, f);
fclose(f);

`COMPUTE BigInt number: 14.0 [us]` `fileIO BigInt took: 4359.0 [us] / 158-digits` ( [TIME]-domain costs explode 2+ orders of magnitude to "save" a bit in [SPACE]-domain ) Facts matter more than the below expressed hate >>> https://stackoverflow.com/a/47780580 `fileIO BigInt took: 115872.0 [us] / 158-Mdigits` — user3666197, Dec 13 '17 at 08:50
@David Iten Thank you for your great answer. This worked for me! — zx228, Dec 13 '17 at 09:08

score -5 · Answer 2 · edited Dec 13 '17 at 15:22

Prologue:
A size of data is a static attribute, but the flow is always our worst enemy ever.

"Could such functionality be added to Chapel's standard library?"

With a current price of adding a few more units, tens or even many hundreds of [TB]-s of RAM-capacity, the problem is IMHO never to be solved by any extension of the language in the above sketched direction.

Why never? Because of the exploding costs:

If one spends just a few moments with facts, the following latency-map appears on a blank sheet of paper. While respective numbers may vary a bit, the message is in the orders of magnitude and in the dependency-chain of the thought process:

            ________________________________________________________________________________________
           /                                                                                       /
          /                   ________________________________________________________            / 
         /                   /                                                       /           /  
        /                   /          xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx          /           /   
       /                   /          /                /                /          /           /    
      /                   / SOMEWHAT /   PRETTY       /  PROHIBITIVELY /          /           /     
     /  CHEAPEST         /  CHEAP   /    EXPENSIVE   /   EXPENSIVE    /          /           /      
    /   EVER            /   ZONE   /     ZONE       /    ZONE        /          /           /       
   /___________________/. . . . . / _ _ _ _ _ _ _ _/ ! ! ! ! ! ! ! !/          /           /_______________________
             /        /          /                /                /          /           /          /            /
   in-CACHE / in-RAM / CONVERT  / STORE          / RE-READ        / CONVERT  / in-RAM    / in-CACHE / in-CPU-uop /
      ~ + 5 [ns]     |          |                |                |          |           |          |            |
        + 5 [ns]     |          |                |                |          |           |          |            |
            |        |          |                |                |          |           |          |            |
            | ~ +300 [ns/kB]    |                |                |          |           |          |            |
            |   +300 [ns/kB]    |                |                |          |           |          |            |
            |        |          |                |                |          |           |          |            |
            |        |+VOLUME   [   GB]          |                |          |           |          |            |
            |        | x 100.000[ns/GB]          |                |          |           |          |            |
            |        |          |                |                |          |           |          |            |
            |        |          |+1              |                |          |           |          |            |
            |        |          | x    15.000.000[ns]             |          |           |          |            |
            |        |          |+VOLUME         [   GB]          |          |           |          |            |
            |        |          | x 3.580.000.000[ns/GB]          |          |           |          |            |
            |        |          |                |                |          |           |          |            |
            |        |          |                |+1 FIND         |          |           |          |            |
            |        |          |                | x    15.000.000[ns]       |           |          |            |
            |        |          |                |+1 DATA         |          |           |          |            |
            |        |          |                | x    15.000.000[ns]       |           |          |            |
            |        |          |                |+VOLUME         [   GB]    |           |          |            |
            |        |          |                | x 3.580.000.000[ns/GB]    |           |          |            |
            |        |          |                |                |          |           |          |            |
            |        |          |                |                |+VOLUME   [   GB]     |          |            |
            |        |          |                |                | x 100.000[ns/GB]     |          |            |
            |        |          |                |                |          |           |          |            |
            |        |          |                |                |          |    ~ +300 [ns/kB]    |            |
            |        |          |                |                |          |      +300 [ns/kB]    |            |
            |        |          |                |                |          |           |          |            |
            |        |          |                |                |          |           |    ~ + 5 [ns]         |
            |        |          |                |                |          |           |      + 5 [ns]         |
            |        |          |                |                |          |           |          |            |
            |        |          |                |                |          |           |          |    ~ + 0.3 [ns/uop]
            |        |          |                |                |          |           |          |      + 2.0 [ns/uop]

Last but not least, just calculate such step's effects on << 1.0 speedup

Given the original processing took XYZ [ns],
the "modified" processing will take:

                          XYZ [ns]                  : the PURPOSE
+ ( VOL [GB] *    300.000.000 [ns/GB] )             :            + MEM/CONVERT   
+ ( VOL [GB] *        100.000 [ns/GB] )             :            + CPU/CONVERT
+                  15.000.000 [ns]                  :            + fileIO SEEK
+ ( VOL [GB] *  3.580.000.000 [ns/GB] )             :            + fileIO STORE
+                  15.000.000 [ns]                  :            + fileIO SEEK / FAT
+                  15.000.000 [ns]                  :            + fileIO SEEK / DATA
+ ( VOL [GB] *  3.580.000.000 [ns/GB] )             :            + fileIO RE-READ
+ ( VOL [GB] *        100.000 [ns/GB] )             :            + CPU/CONVERT
+ ( VOL [GB] *    300.000.000 [ns/GB] )             :            + MEM/CONVERT   

_______________________________________________
                   45.000.XYZ [ns]
+               7.660.200.000 [ns/GB] * VOL [GB]

so, the such adversely impacted performance will get damaged ( as Amdahl's Law shows ):

             1
S = ------------------------------------------------------------  << 1.00
     1  + ( 45.000.XYZ [ns] + 7.660.200.000 [ns/GB] * VOL[GB] )

This post is not addressing the OP's concern. The information is interesting, but not appropriate here. — Brian Dolan, Dec 13 '17 at 00:20
Is it "*not addressing the concern*"? It does address it more than anything else. **Would you Brian ever spend $45.000.000 to receive but $300?**. In case you would dare to, your shareholders will be fair to call it a plain robbery. — user3666197, Dec 13 '17 at 04:58

improving bigint write to disk performance

2 Answers2