3

I want to cache data produced by my application in memory, but if memory becomes scarce I would like to swap the data to disk.

Ideally I would want to be notified by the VM that it needs memory and write my data to disk and free some memory that way. But I don't see any way to hook myself into the VM in a manner that notifies me before an OutOfMemoryError occurs somewhere (most likely in code not related to the cache in any way).

The Reference classes in java.lang.ref do not seem to be of any use in this case, their notification mechanism (ReferenceQueue) only triggers after the reference has already been reclaimed by the GC. Then it would be too late to save the data to disk.

What alternatives are available to manage the heap memory efficiently? (do not swap to disk until absolutely unavoidable)


Edit1: In response to the comment "The OS already does that for you" - this only covers part of the issue - the amount of memory the OS can allocate is a limited resource. There are also other limits than the amount of memory available to the OS that need to be considered here:

  • The limit imposed by the VM's architecture (32-Bit VM)
  • The limit of memory that can be allocated to the VM's process (32-Bit OS)
  • The limit possibly imposed on the VM using the -Xmx option

Simply running the VM with unlimited heap size will not prevent it from running out of memory, even if the OS still has plenty available it may not be available to the VM for above reasons.

Aleš
  • 8,896
  • 8
  • 62
  • 107
Durandal
  • 19,919
  • 4
  • 36
  • 70
  • 2
    Most operating systems handle this for you. Why do you wish to reimplement it? – George Cummins Dec 08 '11 at 13:53
  • And to reinforce the comment above: http://en.wikipedia.org/wiki/Paging – jweyrich Dec 08 '11 at 13:54
  • @George Typically, you don't get to choose what goes to disk or what doesn't. You may have some mission critical data you don't access very often that you don't want to load from virtual memory, but the OS has decided to put it there. – corsiKa Dec 08 '11 at 13:58

5 Answers5

1

I recommend you use some API calls to monitor the free memory available and act accordingly.

See this question about how to monitor the amount of free memory available to the JVM.

Community
  • 1
  • 1
Tudor
  • 61,523
  • 12
  • 102
  • 142
  • I'm hoping for a better way. Monitoring the free memory implies that the cache always needs to keep as much memory free as the application could allocate in the worst case between checks. – Durandal Dec 08 '11 at 14:31
0

You can write a thread that checks for free memory repeatedly and acts if a limit is passed.

Angelo Fuchs
  • 9,825
  • 1
  • 35
  • 72
0

I would use an internal database (Derby comes to mind for development purposes, replacing it with your chosen flavor for deployment). Typically they have this functionality built in already, and you can configure how much of the database to keep cached in memory.

corsiKa
  • 81,495
  • 25
  • 153
  • 204
  • To the best of my knowledge Databases (and Derby shoul be no exception) always commit everything to Disk before a transaction succeeds. That is the data exists on Disk and a portion is cached in memory. Thats the subtle but in my case decisive difference, I don't want to write the data to disk at all if memory suffices. – Durandal Dec 08 '11 at 14:28
0

That's a very difficult thing to do in pure Java, for the reasons you've already hinted at.

  • It is quite normal for the heap to become nearly full before GC kicks in, so the only way you can determine how much free memory is really available is to do a GC (and you don't want to do that too often). You could use the CMSInitiatingOccupancyFraction option to make sure GC happens when the perm gen is (say) 80% full - you could then assume the value of "free memory" returned by the Management API is probably about right (for values > 80%). But there's no guarantee, of course.

  • As you mentioned, soft references are automatically cleared by the collector before being added to the queues with which they are registered, so they aren't particularly helpful here. You could create a dummy SoftReference and use its enqueing as an indication that memory is low. But I'm not sure about timing - could you guarantee to dump all of your data to disk before the JVM runs out of memory? Probably not.

Could you instead flush your cache to disk when it reaches a certain size, e.g. if it exceeds 500MB then flush it?

Or could you use a MappedByteBuffer with a private mapping - The data won't then be flushed to disk? If I remember correctly the data you write is stored in off-heap "direct" memory (at least on Linux) and so won't consume any of your heap - but please check that. If RAM became exhausted you would of course start to use Swap.

Paul Cager
  • 1,910
  • 14
  • 21
  • Flushing to disk on a set memory usage wouldn't pose a problem, that would be my fallback plan if no better options turn up (either a percentage of max memory or just a set limit im MB). You suggestion of MappedByteBuffers looks promising, too bad the JRE source isn't too helpful, it goes native without showing much meat. I'll definetly check them out. – Durandal Dec 08 '11 at 17:39
0

Have you considered using memory mapped files? See http://en.wikipedia.org/wiki/Memory-mapped_file

It solves your problem regarding not being able to access memory greater than that allocated to the VM.

Mike
  • 2,417
  • 1
  • 24
  • 33