How to detect memory-pressure in a java program?

Question

I have a batch process, written in java, that analyzes extremely long sequences of tokens (maybe billions or even trillions of them!) and observes bi-gram patterns (aka, word-pairs).

In this code, bi-grams are represented as Pairs of Strings, using the ImmutablePair class from Apache commons. I won't know in advance the cardinality of the tokens. They might be very repetitive, or each token might be totally unique.

The more data I can fit into memory, the better the analysis will be!

But I definitely can't process the whole job at once. So I need to load as much data as possible into a buffer, perform a partial analysis, flush my partial results to a file (or to an API, or whatever), then clear my caches and start over.

One way I'm optimizing memory usage is by using Guava interners to de-duplicate my String instances.

Right now, my code looks essentially like this:

int BUFFER_SIZE = 100_000_000;

Map<Pair<String, String>, LongAdder> bigramCounts = new HashMap<>(BUFFER_SIZE);

Interner<String> interner =  Interners.newStrongInterner();

String prevToken = null;
Iterator<String> tokens = getTokensFromSomewhere();
while (tokens.hasNest()) {
  String token = interner.intern(tokens.next());
  if (prevToken != null) {
    Pair<String, String> bigram = new ImmutablePair(prevToken, token);
    LongAdder bigramCount = bigramCounts.computeIfAbsent(
        bigram,
        (c) -> new LongAdder()
    );
    bigramCount.increment();
    // If our buffer is full, we need to flush!
    boolean tooMuchMemoryPressure = bigramCounts.size() > BUFFER_SIZE;
    if (tooMuchMemoryPressure) {
      // Analyze the data, and write the partial results somewhere
      doSomeFancyAnalysis(bigramCounts);
      // Clear the buffer and start over
      bigramCounts.clear();
    }
  }
  prevToken = token;
}

The trouble with this code is that this is a very crude way of determining whether there is tooMuchMemoryPressure.

I want to run this job on many different kinds of hardware, with varying amounts of memory. No matter the instance, I want this code to automatically adjust to maximize the memory consumption.

Rather than using some hard-coded constant like BUFFER_SIZE (derived through experimentation, heuristics, guesswork), I actually just want ask the JVM whether the memory is almost full. But that's a very complicated question, considering the complexities of mark/sweep algorithms, and all the different generational collectors.

What would be a good general-purpose approach for accomplishing something like this, assuming this batch-job might run on a variety of different machines, with different amounts of available memory? I don't need this to be extremely precise... I'm just looking for a rough signal to know that I need to flush the buffer soon, based on the state of the actual heap.

If it's a batch job, have you tried just getting the [free memory](https://stackoverflow.com/q/12807797/2541560)? — Kayaman, Mar 26 '22 at 22:23
How many identical bigrams do you have on average? And how many different? Instead of constructing a new `LongAdder` for every unique bigram, you might be better off using a `Map, Long>` and `bigramCounts.merge(bigram, 1L, Long::add);` All values up to 128 are boxed into shared canonical `Long` instances. Your link to the `Interner` does not work. But it’s clear that depending on the actual likelihood of duplicate strings, this `Interner` may do more harm than good. Using Java 9 or newer for compact strings and G1GC with StringDeduplication may gain you far more. — Holger, Mar 28 '22 at 07:33
Thanks for the suggestions @Holger, but that's not quite what I'm asking! The question as I've presented it here is a bit of a toy example. I'm not asking "how do I make this program more memory-efficient?" which could have many many possible approaches. Instead, I'm just asking "how can I detect memory pressure from my java code?" — benjismith, Mar 28 '22 at 17:38
Well, when you manage to eliminate the memory pressure, you don’t need a way to detect it. But for your literal question, look [here](https://stackoverflow.com/a/69974902/2711488) and [there](https://stackoverflow.com/a/48148171/2711488)… — Holger, Mar 29 '22 at 07:58

Harald · Accepted Answer · 2022-04-02T10:34:17.390

The simplest way to get a first glimpse of what is going on with the process' heap space is Runtime.freeMemory() together with .maxMemory and .totalMemory. Yet the first does not factor in garbage and so is an under-estimation at best and may be completely misleading just before the GC kicks in.

Assuming that for your application "memory pressure" basically means "(soon) not enough", the interesting value is free memory right after a GC.

This is available by using GarbageCollectorMXBean which provides GcInfo with memory usage after the GC.

The bean can be watched exactly after GC since it is a NotificationEmitter, despite this is not being advertised in the Javadoc. Some minimal code, patterned after a longer example is

  void registerCallback() {
    List<GarbageCollectorMXBean> gcbeans =
      java.lang.management.ManagementFactory.getGarbageCollectorMXBeans();
    for (GarbageCollectorMXBean gcbean : gcbeans) {
      System.out.println(gcbean.getName());
      NotificationEmitter emitter = (NotificationEmitter) gcbean;
      emitter.addNotificationListener(this::handle, null, null);
    }
  }

  private void handle(Notification notification, Object handback) {
    if (!notification.getType()
      .equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION)) {
      return;
    }
    GarbageCollectionNotificationInfo info = GarbageCollectionNotificationInfo
      .from((CompositeData) notification.getUserData());
    GcInfo gcInfo = info.getGcInfo();
    gcInfo.getMemoryUsageAfterGc().forEach((name, memUsage) -> {
      System.err.println(name+ "->" + memUsage);
    });
  }

There will be several memUsage entries and this will also differ depending on the GC. But from the values provided, used, committed and max we can derive upper limits on free memory which again should give the "rough signal" the OP is asking for.

The doSomeFancyAnalysis will certainly also need its share of fresh memory, so with a very rough estimate how much that will be per bigramm to analyze, this could be the limit to watch for.

This was super helpful! I followed the link to your longer example, and used that code to add lots of memory logging into my application. Then I observed it running with those logs in place, and found a good threshold of available memory where my application should flush the buffer and perform the analysis. — benjismith, Jun 03 '22 at 18:25

How to detect memory-pressure in a java program?

1 Answers1