3

I read a lot about buffer leak detection in Netty4 and to me it looks like there is no deterministic way to detect such leaks in unit tests.

However, the importance of such feature for unit testing is so great that it feels very wrong that there are no clear guidelines on how to do it.

Moreover, most of the sources including the very original document http://netty.io/wiki/reference-counted-objects.html make things amazingly confusing by giving vague hints like:

"The following output shows a leak from our unit test (XmlFrameDecoderTest.testDecodeWithXml()):"

which sounds like there exists a way to deterministically detect leaks in buffer allocators in unit tests, while in fact there is no such thing, as Trustin Lee himself notes in his answers (link below).

No wonder this was reposted dozens of times by various sources that have no idea and just copy-paste words without testing.

*) Trustin Lee suggested running an application for 30 seconds under some busy workload in the following topic. Netty 4/5 does not actually detect resource leak of bytebuf? However that doesn't trigger detection or any output from ResourceLeakDetector for me.

*) I also tried the GC trick suggested in the following topic How to force garbage collection in Java? but that doesn't make any difference either.

GC is so unpredictable that it's very tough to imagine how ResourceLeakDetector can be leveraged to create clean and thorough buffer leak unit tests.

*) Another way is to test refCnt for every single ByteBuf that was created while a test was running. But sometimes it's impossible to get hold of every such reference, because an interface might declare String as an input parameter and then its implementation would create and release a ByteBuf instance internally, and that reference will be unreachable for unit test, however if a release didn't happen it would produce a leak with no chance to detect in unit tests.

*) I also couldn't find an easy way to get a list of all existing buffers from the allocator, otherwise it would be possible to just check refCnt for every single one of them.


I wonder if anybody can share what are the best practices that work in deterministic way and can be practically used in unit tests to consistently find buffer leaks in large codebases that use Netty buffer allocators.

So far to me it seems like unit tests are useless for this purpose, unless you run a full-scale server for extended periods of time in your unit tests (and this by the way also doesn't guarantee you anything, just increases your chances in theory). As far as I can see, there is no good reason for such limitations to testing to exist, but we have what we have.

There is so much confusing information on this topic all over the internet, sadly, originating from Netty documentation itself, that I really want the facts to be stated in a straight and clear manner.

Even if the answer to my question is "it's impossible", just having this text available might save some people a lot of time researching.


P.S. A very simple example to demonstrate the lack of output. If someone can tell what changes will be required to have this code produce the leak output, I would be most grateful.

http://gist.github.com/codekrolik/e55b8ece07270f40aad85f691696fe6a

Community
  • 1
  • 1
  • Another idea was to count the number of allocations and deallocations and make sure there are no dangling active buffers. For that reason I tried to leverage information about arenas that is accessible from allocators. However the inconsistency of the numbers returned is so mindboggling, that they for sure can't be used to produce a stable leak-aware unit test. https://gist.github.com/codekrolik/6aa035ac650ea2972e532227a355a0e4 – Birzhan Amirov May 19 '16 at 19:04
  • For example, when I run 50000 allocations and releases, the stats returned by arenas are like the following: directActive 0 directAlloc 1 directDealloc 1 heapActive 0 heapAlloc 0 heapDealloc 0 But if I comment out the release line, suddenly the output is transformed directActive 50000 directAlloc 50000 directDealloc 0 heapActive 0 heapAlloc 0 heapDealloc 0 – Birzhan Amirov May 19 '16 at 19:08

2 Answers2

1

So I managed to make unit tests work for me.

The mechanism is like the following:

1) Create a PooledBufferAllocator with disabled cache, as was suggested in https://github.com/netty/netty/issues/5275

PooledByteBufAllocator alloc = new PooledByteBufAllocator(true, 1, 1, 8192, 11, 0, 0, 0);

2) Make sure all the Bootstraps use this allocator

a. Client

Bootstrap clientBootstrap = new Bootstrap();
clientBootstrap.option(ChannelOption.ALLOCATOR, alloc);

b. Server

ServerBootstrap serverBootstrap = new ServerBootstrap();
serverBootstrap.option(ChannelOption.ALLOCATOR, alloc)
    .childOption(ChannelOption.ALLOCATOR, alloc);

3) After the test is done, check buffer leaks for direct and heap buffers

assertEquals(0, getActiveDirectBuffers(alloc));
assertEquals(0, getActiveHeapBuffers(alloc));

int getActiveDirectBuffers(PooledByteBufAllocator alloc) {
    int directActive = 0, directAlloc = 0, directDealloc = 0;
    for (PoolArenaMetric arena : alloc.directArenas()) {
        directActive += arena.numActiveAllocations();
        directAlloc += arena.numAllocations();
        directDealloc += arena.numDeallocations();
    }
    System.out.println("directActive " + directActive + " directAlloc " + directAlloc + " directDealloc " + directDealloc);
    return directActive;
}

int getActiveHeapBuffers(PooledByteBufAllocator alloc) {
    int heapActive = 0, heapAlloc = 0, heapDealloc = 0;
    for (PoolArenaMetric arena : alloc.heapArenas()) {
        heapActive += arena.numActiveAllocations();
        heapAlloc += arena.numAllocations();
        heapDealloc += arena.numDeallocations();
    }
    System.out.println("heapActive " + heapActive + " heapAlloc " + heapAlloc + " heapDealloc " + heapDealloc);
    return heapActive;
}
0

For unit tests, System.gc() has worked for me. (I'm not sure how reliable it is, but it seems to be reliable enough to cause gc enough in the tests to catch leaks). However, for me the leaks in unit tests have mostly been because I forgot to release the buffer in the tests (not because the server code itself had a leak).

Like you mention, an integration and/or load test might help catch any leaks in the server.

Josh Wilson
  • 3,585
  • 7
  • 32
  • 53
  • Here's my very simple example that doesn't work. Would you mind modifying it so it works? https://gist.github.com/codekrolik/e55b8ece07270f40aad85f691696fe6a – Birzhan Amirov May 19 '16 at 00:44