vector<bool>
strikes again
It's actually allocating for 0x1fffffffff20000 bits (that's 144 petabits) on my test box. That's coming directly from IndexSet::resize().
Now I have serious questions about HElib using std::vector<bool>
here (it seems they would be far better served with something like boost::icl::interval_set<>
).

Well. That was a wild goose chase (that IndexSet serialization can be much improved). However, the real problem is that you had Undefined Behaviour because you don't deserialize the same type as you serialize.
You serialize a PubKey
, but attempt to deserialize as PubKey*
. Uhoh.
Now beyond that, there's quite a bit of problems:
You had to modify the library to make private members public. This can easily violate ODR (making the class layout incompatible).
You seem to treat the context as a "dynamic" resource, which will engage Object Tracking. This could be a viable approach. BUT. You'll have to think about ownership.
It seems like you didn't do that yet. For example, the line in load_construct_data
for DoublCRT
is a definite memory-leak:
helib::Context * context = new helib::Context(2,3,1);
You never use it nor ever free it. In fact, you simply overwrite it with the deserialized instance, which may or may not be owned. Catch-22
Exactly the same happens in load_construct_data
for PubKey
.
worse, in save_construct_data
you completely gratuitously copy context objects for each DoubleCRT
in each SecKey
:
auto context = polynomial->getContext();
archive << &context;
Because you fake it out as pointer-serialization, again (obviously useless) object tracking kicks in, just meaning you serialize redundant Context
copies which will will be all be leaked un deserialization.
I'd be tempted to assume the context instances in both would always be the same? Why not serialize the context(s) separately anyways?
In fact I went and analyzed the HElib source code to check these assumptions. It turns out I was correct. Nothing ever constructs a context outside
std::unique_ptr<Context> buildContextFromBinary(std::istream& str);
std::unique_ptr<Context> buildContextFromAscii(std::istream& str);
As you can see, they return owned pointers. You should have been using them. Perhaps even with the built-in serialization, that I practically stumble over here.
Time To Regroup
I'd use the serialization code from HElib (because, why reinvent the wheel and make a ton of bugs doing so?). If you insist on integration with Boost Serialization, you can have your cake and eat it:
template <class Archive> void save(Archive& archive, const helib::PubKey& pubkey, unsigned) {
using V = std::vector<char>;
using D = iostreams::back_insert_device<V>;
V data;
{
D dev(data);
iostreams::stream_buffer<D> sbuf(dev);
std::ostream os(&sbuf); // expose as std::ostream
helib::writePubKeyBinary(os, pubkey);
}
archive << data;
}
template <class Archive> void load(Archive& archive, helib::PubKey& pubkey, unsigned) {
std::vector<char> data;
archive >> data;
using S = iostreams::array_source;
S source(data.data(), data.size());
iostreams::stream_buffer<S> sbuf(source);
{
std::istream is(&sbuf); // expose as std::istream
helib::readPubKeyBinary(is, pubkey);
}
}
That's all. 24 lines of code. And it's gonna be tested and maintained by the library authors. You can't beat that (clearly). I've modified the tests a bit so we don't abuse private details anymore.
Cleaning Up The Code
By separating out a helper to deal with the blob writing, we can implement different helib
types in a very similar way:
namespace helib { // leverage ADL
template <class A> void save(A& ar, const Context& o, unsigned) {
Blob data = to_blob(o, writeContextBinary);
ar << data;
}
template <class A> void load(A& ar, Context& o, unsigned) {
Blob data;
ar >> data;
from_blob(data, o, readContextBinary);
}
template <class A> void save(A& ar, const PubKey& o, unsigned) {
Blob data = to_blob(o, writePubKeyBinary);
ar << data;
}
template <class A> void load(A& ar, PubKey& o, unsigned) {
Blob data;
ar >> data;
from_blob(data, o, readPubKeyBinary);
}
}
This is elegance to me.
FULL LISTING
I have cloned a new gist https://gist.github.com/sehe/ba82a0329e4ec586363eb82d3f3b9326 that includes the following change-sets:
0079c07 Make it compile locally
b3b2cf1 Squelch the warnings
011b589 Endof investigations, regroup time
f4d79a6 Reimplemented using HElib binary IO
a403e97 Bitwise reproducible outputs
Only the last two commits contains changes related to the actual fixes.
I'll list the full code here too for posterity. There are a number of subtle reorganizations and ditto comments in the test code. You'd do well to read through them carefully to see whether you understand them and the implications suit your needs. I left comments describing why the test assertions are what they are to help.
File serialization.hpp
#ifndef EVOTING_SERIALIZATION_H
#define EVOTING_SERIALIZATION_H
#define BOOST_TEST_MODULE main
#include <helib/helib.h>
#include <boost/serialization/split_free.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/iostreams/stream_buffer.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/device/array.hpp>
namespace /* file-static */ {
using Blob = std::vector<char>;
template <typename T, typename F>
Blob to_blob(const T& object, F writer) {
using D = boost::iostreams::back_insert_device<Blob>;
Blob data;
{
D dev(data);
boost::iostreams::stream_buffer<D> sbuf(dev);
std::ostream os(&sbuf); // expose as std::ostream
writer(os, object);
}
return data;
}
template <typename T, typename F>
void from_blob(Blob const& data, T& object, F reader) {
boost::iostreams::stream_buffer<boost::iostreams::array_source>
sbuf(data.data(), data.size());
std::istream is(&sbuf); // expose as std::istream
reader(is, object);
}
}
namespace helib { // leverage ADL
template <class A> void save(A& ar, const Context& o, unsigned) {
Blob data = to_blob(o, writeContextBinary);
ar << data;
}
template <class A> void load(A& ar, Context& o, unsigned) {
Blob data;
ar >> data;
from_blob(data, o, readContextBinary);
}
template <class A> void save(A& ar, const PubKey& o, unsigned) {
Blob data = to_blob(o, writePubKeyBinary);
ar << data;
}
template <class A> void load(A& ar, PubKey& o, unsigned) {
Blob data;
ar >> data;
from_blob(data, o, readPubKeyBinary);
}
}
BOOST_SERIALIZATION_SPLIT_FREE(helib::Context)
BOOST_SERIALIZATION_SPLIT_FREE(helib::PubKey)
#endif //EVOTING_SERIALIZATION_H
File test-serialization.cpp
#define BOOST_TEST_MODULE main
#include <boost/test/included/unit_test.hpp>
#include <helib/helib.h>
#include <fstream>
#include "serialization.hpp"
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
helib::Context helibTestMinimalContext(){
// Plaintext prime modulus
unsigned long p = 4999;
// Cyclotomic polynomial - defines phi(m)
unsigned long m = 32109;
// Hensel lifting (default = 1)
unsigned long r = 1;
return helib::Context(m, p, r);
}
helib::Context helibTestContext(){
auto context = helibTestMinimalContext();
// Number of bits of the modulus chain
unsigned long bits = 300;
// Number of columns of Key-Switching matix (default = 2 or 3)
unsigned long c = 2;
// Modify the context, adding primes to the modulus chain
buildModChain(context, bits, c);
return context;
}
BOOST_AUTO_TEST_CASE(serialization_pubkey) {
auto context = helibTestContext();
helib::SecKey secret_key(context);
secret_key.GenSecKey();
// Compute key-switching matrices that we need
helib::addSome1DMatrices(secret_key);
// Set the secret key (upcast: SecKey is a subclass of PubKey)
const helib::PubKey& original_pubkey = secret_key;
std::string const filename = "pubkey.serialized";
{
std::ofstream os(filename, std::ios::binary);
boost::archive::binary_oarchive oarchive(os);
oarchive << context << original_pubkey;
}
{
// just checking reproducible output
std::ofstream os(filename + ".2", std::ios::binary);
boost::archive::binary_oarchive oarchive(os);
oarchive << context << original_pubkey;
}
// reading back to independent instances of Context/PubKey
{
// NOTE: if you start from something rogue, it will fail with PAlgebra mismatch.
helib::Context surrogate = helibTestMinimalContext();
std::ifstream ifs(filename, std::ios::binary);
boost::archive::binary_iarchive iarchive(ifs);
iarchive >> surrogate;
// we CAN test that the contexts end up matching
BOOST_TEST((context == surrogate));
helib::SecKey independent(surrogate);
helib::PubKey& indep_pk = independent;
iarchive >> indep_pk;
// private again, as it should be, but to understand the relation:
// BOOST_TEST((&independent.context == &surrogate));
// The library's operator== compares the reference, so it would say "not equal"
BOOST_TEST((indep_pk != original_pubkey));
{
// just checking reproducible output
std::ofstream os(filename + ".3", std::ios::binary);
boost::archive::binary_oarchive oarchive(os);
oarchive << surrogate << indep_pk;
}
}
// doing it the other way (sharing the context):
{
helib::PubKey restored_pubkey(context);
{
std::ifstream ifs(filename, std::ios::binary);
boost::archive::binary_iarchive iarchive(ifs);
iarchive >> context >> restored_pubkey;
}
// now `operator==` confirms equality
BOOST_TEST((restored_pubkey == original_pubkey));
{
// just checking reproducible output
std::ofstream os(filename + ".4", std::ios::binary);
boost::archive::binary_oarchive oarchive(os);
oarchive << context << restored_pubkey;
}
}
}
TEST OUTPUT
time ./test-serialization -l all -r detailed
Running 1 test case...
Entering test module "main"
test-serialization.cpp(34): Entering test case "serialization_pubkey"
test-serialization.cpp(61): info: check (context == surrogate) has passed
test-serialization.cpp(70): info: check (indep_pk != original_pubkey) has passed
test-serialization.cpp(82): info: check (restored_pubkey == original_pubkey) has passed
test-serialization.cpp(34): Leaving test case "serialization_pubkey"; testing time: 36385217us
Leaving test module "main"; testing time: 36385273us
Test module "main" has passed with:
1 test case out of 1 passed
3 assertions out of 3 passed
Test case "serialization_pubkey" has passed with:
3 assertions out of 3 passed
real 0m36,698s
user 0m35,558s
sys 0m0,850s
Bitwise Reproducible Outputs
On repeated serialization it appears that indeed the output is bitwise identical, which may be an important property:
sha256sum pubkey.serialized*
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f pubkey.serialized
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f pubkey.serialized.2
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f pubkey.serialized.3
66b95adbd996b100bff58774e066e7a309e70dff7cbbe08b5c77b9fa0f63c97f pubkey.serialized.4
Note that it is (obviously) not identical across runs (because it generates different key material).
Side Quest (The Wild Goose Chase)
One way to improve the IndexSet serialization code manually is to also use vector<bool>
:
template<class Archive>
void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
std::vector<bool> elements;
elements.resize(index_set.last()-index_set.first()+1);
for (auto n : index_set)
elements[n-index_set.first()] = true;
archive << index_set.first() << elements;
}
template<class Archive>
void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version){
long first_ = 0;
std::vector<bool> elements;
archive >> first_ >> elements;
index_set.clear();
for (size_t n = 0; n < elements.size(); ++n) {
if (elements[n])
index_set.insert(n+first_);
}
}
Better idea would be to use dynamic_bitset
(for which I happen to have contributed the serialization code (see How to serialize boost::dynamic_bitset?)):
template<class Archive>
void save(Archive & archive, const helib::IndexSet & index_set, const unsigned int version){
boost::dynamic_bitset<> elements;
elements.resize(index_set.last()-index_set.first()+1);
for (auto n : index_set)
elements.set(n-index_set.first());
archive << index_set.first() << elements;
}
template<class Archive>
void load(Archive & archive, helib::IndexSet & index_set, const unsigned int version) {
long first_ = 0;
boost::dynamic_bitset<> elements;
archive >> first_ >> elements;
index_set.clear();
for (size_t n = elements.find_first(); n != -1; n = elements.find_next(n))
index_set.insert(n+first_);
}
Of course, you would likely have to do similar things for IndexMap
.