1

I am seeing very high latencies when invoking java.util.logging.Logger.log() in some instances, in the following code:

private static Object[] NETWORK_LOG_TOKEN = new Object[] {Integer.valueOf(1)};

private final TimeProbe probe_ = new TimeProbe();

public void onTextMessagesReceived(ArrayList<String> msgs_list) {
    final long start_ts = probe_.addTs(); // probe A

    // Loop through the messages
    for (String msg: msgs_list) {
        probe_.addTs(); // probe B
        log_.log(Level.INFO, "<-- " + msg, NETWORK_LOG_TOKEN);
        probe_.addTs(); // probe C

        // Do some work on the message ...
        probe_.addTs(); // probe D
    }
        
    final long end_ts = probe_.addTs(); // probe E
    if (end_ts - start_ts >= 50)  {
        // If the run was slow (>= 50 millis) we print all the recorded timestamps
        log_.info(probe_.print("Slow run with " + msgs_list.size() + " msgs: "));
    }
    probe_.clear();
}

The probe_ is simply an instance of this very basic class:

public class TimeProbe {
    final ArrayList<Long> timestamps_ = new ArrayList<>();
    final StringBuilder builder_ = new StringBuilder();

    public void addTs() {
        final long ts = System.currentTimeMillis();
        timestamps_.add(ts);
        return ts;
    }

    public String print(String prefix) {
        builder_.setLength(0);
        builder_.append(prefix);
        for (long ts: timestamps_) {
            builder_.append(ts);
            builder_.append(", ");
        }
        builder_.append("in millis");
        return builder_.toString();
    }

    public void clear() {
        timestamps_.clear();
    }
}

And here is the handler that logs the NETWORK_LOG_TOKEN entries:

final FileHandler network_logger = new FileHandler("/home/users/dummy.logs", true);
    network_logger2.setFilter(record -> {
        final Object[] params = record.getParameters();
        // This filter returns true if the params suggest that the record is a network log
        // We use Integer.valueOf(1) as our "network token"
        return (params != null && params.length > 0 && params[0] == Integer.valueOf(1));
    });

In some cases, I am getting the following ouputs (adding labels with probe A,B,C,D,E to make things more clear):

//                        A              B               C              D             B              C              D             E         
slow run with 2 msgs: 1616069594883, 1616069594883, 1616069594956, 1616069594957, 1616069594957, 1616069594957, 1616069594957, 1616069594957

Everything takes less than 1ms, except for the line of code between B and C (during the first iteration of the for loop), which takes a whopping 73 milliseconds. This does not occur every time onTextMessagesReceived() is called, but the fact that it does is big problem. I would welcome any ideas explaining where this lack of predictability comes from.

As a side note, I have checked that my disk IO is super low, and no GC pause occurred around this time. I would think my NETWORK_LOG_TOKEN setup is pretty flimsy at best in terms of design, but I still cannot think of reasons why sometimes, this first log line takes forever. Any pointers or suggestions as to what could be happening would be really appreciated :)!

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
MarkoPaulo
  • 474
  • 4
  • 19
  • 1
    Is that the only log statement that takes that long? Is this unusual for a logging statement in your configuration? How do you know it's the logging and not the way you've added the ts to the arraylist? How about running this 10k times and seeing if it settles down? It could be related to jit or other optimisations kicking in at the start of the loop when it realises that you're about to loop, an effect that would be ironed out over thousands of runs. In general, you can't run code once and expect a profile to be reasonable or accurate. – Software Engineer Mar 18 '21 at 17:15
  • 1
    Have you looked at the granularity of `System.currentTimeMillis()` on your system? If this is rare, it could have been an NTP sync, try using `nanoTime()` instead. It could have been a million things, try to use https://github.com/giltene/jHiccup and look at its output. If nothing shows up, it's of course possible that the logger is actually blocked on IO (writing its buffer out), use a profiler to confirm. – Petr Janeček Mar 18 '21 at 17:16

2 Answers2

2

Things to try:

  1. Enable JVM safepoint logs. VM pauses are not always caused by GC.

  2. If you use JDK < 15, disable Biased Locking: -XX:-UseBiasedLocking. There are many synchronized places in JUL framework. In a multithreaded application, this could cause biased lock revocation, which is a common reason for a safepoint pause.

  3. Run async-profiler in the wall-clock mode with .jfr output. Then, using JMC, you'll be able to find what a thread was exactly doing near the given moment of time.

  4. Try putting a log file onto tmpfs to exclude disk latency, or use MemoryHandler instead of FileHandler to check whether file I/O affects pauses at all.

apangin
  • 92,924
  • 10
  • 193
  • 247
  • Thank you @apangin, I have been playing around with async-profiler, but I haven't been able to figure out how to see what a specific thread is doing at a specific time. Seeing jfr output in jmc gives me some interesting info, but even the thread tab doesn't seem to give me granular enough info to see what a thread is doing at a given time. – MarkoPaulo Mar 20 '21 at 23:45
  • 1
    @MarkoPaulo You may use Event Browser menu in JMC to list all Method Profiling Samples, sort them by 'Start Time', and look for events around the desired interval. – apangin Mar 21 '21 at 11:19
2

Everything takes less than 1ms, except for the line of code between B and C (during the first iteration of the for loop), which takes a whopping 73 milliseconds. [snip] ...but I still cannot think of reasons why sometimes, this first log line takes forever.

The first log record that is published to the root logger or its handlers will trigger lazy loading of the root handlers.

If you don't need to publish to the root logger handlers then call log_.setUseParentHandlers(false) when you add your FileHandler. This will make it so your log records don't travel up to the root logger. It also ensures that you are not publishing to other handlers attached to the parent loggers.

You can also load the root handlers by doing Logger.getLogger("").getHandlers() before you start your loop. You'll pay the price for loading them but at a different time.

log_.log(Level.INFO, "<-- " + msg, NETWORK_LOG_TOKEN);

The string concatenation in this line is going to do array copies and create garbage. Try to do:

log_.log(Level.INFO, msg, NETWORK_LOG_TOKEN);

The default log method will walk the current thread stack. You can avoid that walk by using logp​ methods in tight loops:

public Foo {
   private static final String CLASS_NAME = Foo.class.getName();
   private static final Logger log_ = Logger.getLogger(CLASS_NAME);

   public void onTextMessagesReceived(ArrayList<String> msgs_list) {
       String methodName = "onTextMessagesReceived";
       // Loop through the messages
       for (String msg: msgs_list) {
          probe_.addTs(); // probe B
          log_.logp(Level.INFO, CLASS_NAME, methodName, msg, NETWORK_LOG_TOKEN);
          probe_.addTs(); // probe C

          // Do some work on the message ...
          probe_.addTs(); // probe D
      }
   }
}

In your code you are attaching a filter to the FileHandler. Depends on the use case but loggers also accept filters. Sometimes it makes sense to install a filter on the logger instead of the handler if you are targeting a specific message.

jmehrens
  • 10,580
  • 1
  • 38
  • 47