4

I'm testing a custom Eclipse-RCP application. This application does some simple initialization and then starts a bunch of threads which parse a lot of XML files inside workspace.

Around once in 1000 executions one of those threads crashes with NullPointerException. This usually happens inside Xerces, sometimes in other libraries and sometimes inside Java standard library. The problem is those NullPointerExceptions seem to happen in lines where no pointer is dereferenced. For example:

java.lang.NullPointerException
    at java.util.concurrent.locks.ReentrantReadWriteLock$Sync$HoldCounter.<init>(ReentrantReadWriteLock.java:279)
    at java.util.concurrent.locks.ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter.initialValue(ReentrantReadWriteLock.java:289)
    at java.util.concurrent.locks.ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter.initialValue(ReentrantReadWriteLock.java:286)
    at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
    at java.lang.ThreadLocal.get(ThreadLocal.java:170)
    at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:481)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
    at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
    at org.eclipse.osgi.container.ModuleDatabase.readLock(ModuleDatabase.java:744)
    at org.eclipse.osgi.container.ModuleDatabase.getWiring(ModuleDatabase.java:431)
    at org.eclipse.osgi.container.ModuleContainer.getWiring(ModuleContainer.java:398)
    at org.eclipse.osgi.container.ModuleRevision.getWiring(ModuleRevision.java:137)
    at org.eclipse.osgi.container.ModuleWire.getProviderWiring(ModuleWire.java:51)
    at org.eclipse.osgi.internal.loader.BundleLoader.findRequiredSource(BundleLoader.java:1114)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:392)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:352)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:344)
    at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:160)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.eclipse.core.internal.resources.ProjectContentTypes.usesContentTypePreferences(ProjectContentTypes.java:116)
    at org.eclipse.core.internal.resources.ContentDescriptionManager.getDescriptionFor(ContentDescriptionManager.java:321)
    at org.eclipse.core.internal.resources.File.getContentDescription(File.java:255)
    at my_app.ModelParser.getContentType(ModelParser.java:54)
    at my_app.ModelParser.parse(ModelParser.java:43)
    at my_app.ValidationModelsCache.getModel(ValidationModelsCache.java:44)
    at my_app.BuilderContext.getParseResult(BuilderContext.java:37)
    at my_app.ValidationHandler.validate(ValidationHandler.java:37)
    at my_app.ProjectValidationBuilder$1.run(ProjectValidationBuilder.java:57)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)

Nothing can be null in line 279. In fact, there is not a single dereference in the whole method:

276:    static final class HoldCounter {
277:        int count = 0;
278:        // Use id, not reference, to avoid garbage retention
279:        final long tid = getThreadId(Thread.currentThread());
280:    }

I've double and triple checked I have the right sources. I've even disassembled some of those methods and there doesn't seem to be any way null is dereferenced there.

Here's another example:

Caused by: java.lang.NullPointerException
    at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:233)
    at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:226)
    at com.google.common.collect.ImmutableList.construct(ImmutableList.java:303)
    at com.google.common.collect.ImmutableList.of(ImmutableList.java:98)
    at com.google.common.collect.Iterables.concat(Iterables.java:432)

line 233 is just a return statement:

229:      static Object[] checkElementsNotNull(Object[] array, int length) {
230:            for (int i = 0; i < length; i++) {
231:              checkElementNotNull(array[i], i);
232:            }
233:            return array;
234:      }

So far this only seems to happen on one machine:

CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Linux: 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux
Java:
    openjdk version "1.8.0_121"
    OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-4-b13)
    OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode

but reproduces on a couple of different Java and kernel versions.

What could be causing this behavior, how to debug it?

Does OpenJDK has option like IBMs -Xdump so I can obtain core dump when problematic NullPointerException happens?

Is there some trick to set gdb breakpoint on NullPointerException? I guess jdb won't catch it early enough.

Could this be related to JVMs implicit null checks? Is there some flag to disable them (-Xrs doesn't seem to be working)?

Piotr Praszmo
  • 17,928
  • 1
  • 57
  • 65
  • 4
    279: final long tid = getThreadId(Thread.currentThread()); can be null. if getThreadId() returns a Long, the returned value can be null and will cause an NPE when it is trying to parse to a primitive long. – Wietlol Apr 10 '17 at 13:48
  • [getThreadId](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/80280d8b40e9/src/share/classes/java/util/concurrent/locks/ReentrantReadWriteLock.java#l1492) returns `long`. – Piotr Praszmo Apr 10 '17 at 14:01
  • It may be because your stack traces somehow get mangled, so the trace you see doesn't actually have anything to do with place that produced the exception. – M. Prokhorov Apr 10 '17 at 14:08

1 Answers1

3

What could be causing this behavior

An instrumentation agent, a hardware bug or SIGSEGV signal somehow sent to the process.

Does OpenJDK has option like IBMs -Xdump so I can obtain core dump when problematic NullPointerException happens?

-XX:AbortVMOnException=java.lang.NullPointerException, but this option is available in non-product builds only.

Is there some trick to set gdb breakpoint on NullPointerException?

You may try to set a breakpoint at the following functions:

  • Runtime1::throw_null_pointer_exception(JavaThread*)
  • SharedRuntime::throw_NullPointerException(JavaThread*)
  • SharedRuntime::throw_NullPointerException_at_call(JavaThread*)

Though an exception may be thrown from a lot more different places.

A better way is to setup JVM TI callback that will be invoked on every thrown exception. Here is an example of JVM TI agent that intercepts exceptions.

Could this be related to JVMs implicit null checks? Is there some flag to disable them

This is probably related. Implicit null checks may be disabled by -XX:-ImplicitNullChecks, but the flag is again available only in debug builds of JVM.

Community
  • 1
  • 1
apangin
  • 92,924
  • 10
  • 193
  • 247
  • I still got crashes with `-XX:-ImplicitNullChecks`. I can reproduce this only on optimized OpenJDK builds. Doesn't happen on debug or OracleJDK. I've managed to got a core dump with slightly modified `-XX:AbortVMOnException`. When examined in gdb, looks like legitimate NPE. I was not able to get any of hotspot tools to work. Finally gave up and no longer run my app in that exact configuration. – Piotr Praszmo Aug 08 '17 at 19:23