1

I opened this issue on github project prevayler-clj

https://github.com/klauswuestefeld/prevayler-clj/issues/1

because 1M short vectors, like this [:a1 1], forming the state of the prevayler, results in 1GB file size when serialized, one by one, with Java writeObject.

Is it possible? About 1kB for each PersistentVector? Further investigations demonstrated the same amount of vectors can be serialized in a 80MB file. So, what's going wrong in prevayler serialization? Am I doing something wrong in these tests. Please refer to the github issue for my tests code excerpts.

icamts
  • 186
  • 1
  • 8
  • Yes. The question is about the size of Java serialized Clojure data structures. Can a two element vector be 1kB in size? Why serialization in my REPL experiment (see the code following the link in question) produces a 80MB byte buffer while prevayler logs are 1GB? – icamts Aug 07 '15 at 13:35
  • My hypothesis. Is prevayler serializing class definition every time? If it is so, why this happen? My understanding of prevayler code suggests it should have the same behavior of my experimental code. – icamts Aug 07 '15 at 13:44
  • 1
    I see now. It took a careful study of content from an outside resource (SO has different standards than you may be used to). So prevayler may be concatenating output from independent ObjectOutputStreams. I kind of see how that might arise in an FP-esque architecture. – Marko Topolnik Aug 07 '15 at 14:32

2 Answers2

1

There's nothing wrong with prevLayer per say. It's just that java's writeObject method is not exactly tuned to writing clojure data; it's intended to store the internal structure of any serializable java object. Since clojure vectors are reasonably complex java objects under the hood, I'm not very suprised that a small vector may write out as roughly a Kb of data.

I'd guess that pretty much any clojure-specific serialization method would result in smaller files. From experience, standard clojure.core/pr + clojure.core/read gives a good balance between file size and speed and handles data structures of nearly any size.

See these pages for some insight in the internals of clojure vectors:

Joost Diepenmaat
  • 17,633
  • 3
  • 44
  • 53
  • 1
    1. A basic container class like `PersistentVector` would be expected to have an optimized binary format for standard Java serialization, though. 2. OP indicates that prevayler may be using Java serialization the wrong way, serializing each vector into an independent ObjectOutputStream, then concatenating those. This prevents the reuse of class definitons. – Marko Topolnik Aug 07 '15 at 14:36
  • @JoostDiepenmaat Thanks for your answer. – icamts Aug 08 '15 at 07:45
  • @MarkoTopolnik Thanks for your hints. I've found the difference between my tests and prevayler code. Prevayer calls `(.reset obj-out)` at line 42. So it uses the same `ObjectOutputStream` but the result is it serializes class definitions for every write. Klaus (prevayler's author) confirms it's not a bug but a feature to prevent leaks. If you rewrite your comments in a answer I will be happy to accept it. – icamts Aug 08 '15 at 07:52
1

Prevayler apparently starts a fresh ObjectOutputStream for each serialized element, preventing any reuse of class data between them. Your test code, on the other hand, is written the "natural" way, allowing reuse. What forces Prevayler to restart every time is not clear to me, but I would hesitate to call it a "feature", given the negative impact it has; "workaround" is the more likely designation.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • Just for completeness Klaus explained it is memory leaks he wants to prevent. OOS instance keeps references to transactions preventing them from beeing GCed. He says calling reset every 100 transactions will be a good compromise between memory footprint and log files size. – icamts Aug 09 '15 at 08:49