1

I have a graph algorithm that generates intermediate results associated to different nodes. Currently, I have solved this by using a ConcurrentHashMap<Node, List<Result> (I am running multithreaded). So at first I add new results with map.get(node).add(result) and then I consume all results for a node at once with map.get(node).

However, I need to run on a pretty large graph where the number of intermediate results wan't fit into memory (good old OutOfMemory Exception). So I require some solution to write out the results on disk—because that's where there is still space.

Having looked at a lot of different "off-heap" maps and caches as well as MapDB I figured they are all not a fit for me. All of them don't seem to support Multimaps (which I guess you can call my map) or mutable values (which the list would be). Additionally, MapDB has been very slow for me when trying to create a new collection for every node (even with a custom serializer based on FST).

I can barely imagine, though, that I am the first and only to have such a problem. All I need is a mapping from a key to a list which I only need to extend or read as a whole. What would an elegant and simple solution look like? Or are there any existing libraries that I can use for this?

Thanks in advance for saving my week :).

EDIT
I have seen many good answers, however, I have two important constraints: I don't want to depend on an external database (e.g. Redis) and I can't influence the heap size.

Deproblemify
  • 3,340
  • 5
  • 26
  • 41
  • 1
    possible duplicate https://stackoverflow.com/questions/1565388/increase-heap-size-in-java – Akash Shah Mar 15 '19 at 13:14
  • 2
    Depending on what kind of `List` implementation you use in there, `map.get(node).add(result)` is not necessarily thread-safe, even if the `map` is. – Thilo Mar 15 '19 at 13:32
  • 2
    Is your problem "Data won't fit in Java's heap" or "Data won't fit in any available memory (RAM or swap) I can give to Java"? For the latter, the answer is quite often "use a database". – Jiri Tousek Mar 15 '19 at 13:52
  • You can try `EhCache` and instead of `ConcurrentHashMap>` use `Cache>`. And it is possible to configure `EhCache` to dump excessive elements to disk (though it obviously slows performance). – Ivan Mar 15 '19 at 14:36
  • @Thilo not too relevant to the question, but for completeness: You are right, the List needs to be synchronized (e.g. using `Collections.synchronizedList`) – Deproblemify Mar 15 '19 at 15:52
  • @Ivan what I fear when using something like EhCache is that I have to read out the complete list when appending to it – which again costs a lot of IOs and memory... – Deproblemify Mar 15 '19 at 16:08
  • @AkashShah no it's not as trivial as that, sadly. I edited the question to make it more clear. – Deproblemify Mar 15 '19 at 16:17
  • Would an embedded database (like [H2](http://www.h2database.com)) fit in your constraints? – Jiri Tousek Mar 15 '19 at 17:16

2 Answers2

1

My recollection is that the JVM runs with a small initial max heap size. If you use the -Xmx10000m you can tell the JVM to run with a 10,000 MB (or whatever number you selected) heap. If your underlying OS resources support a larger heap that might work.

zelinka
  • 3,271
  • 6
  • 29
  • 42
1
  1. You can increase the size of heap. The size of heap can be configured to larger than physical memory size of your server while you make sure the condition is right:

    the size of heap + the size of other applications < the size of physical memory + the size of swap space
    

    For instance, if the physical memory is 4G and the swap space is 4G, the heap size can be configured to 6G.

    But the program will suffer from page swapping.

  2. You can use some database like Redis. Redis is key-value database and has List structure.

    I think this is the simplest way to solve your problem.

  3. You can compress the Result instance. First, you serialize the instance and compress that. And define the class:

    class CompressResult { 
        byte[] result; 
        //... 
    } 
    

    And replace the Result to CompressResult. But you should deserialize the result when you want to use it.

    It will work well if the class Result has many fields and is very complicated.

flycash
  • 414
  • 4
  • 8
  • Hi, thanks for the great answer. I am not sure if 3) scales enough for me but I will try. Something like 2) seems to be my preferred solution, an external database feels a bit over-the-top for me, though. – Deproblemify Mar 15 '19 at 15:56
  • Is there maybe something comparable to an embedded Redis for Java? – Deproblemify Mar 15 '19 at 16:07