5

Edit:

As what @Petesh said, I reached the kern.num_taskthreads limit rather than the overall thread limit, which limits the number of threads for an individual process.

The sysctl kern.num_taskthreads is:

kern.num_taskthreads: 2048

And when I used the VM args, -XX:ThreadStackSize=1g, I could only create 122 threads; with -XX:ThreadStackSize=2g, 58 threads was created. It's reasonable.

But it's still strange that no matter how I changed the -Xss args, the result is always 2031. The -Xss args seems only works for main thread which I'm not sure for now.

Original question:

I ran a test to find out how many threads that one JVM can create. And when I adjusted the JVM args, -Xmx and -Xss, the result didn't change.

Here is the code:

public class ThreadTest {
    public static void main(String[] args) {
        int count = 0;
        try {
            while (true) {
                Thread thread = new Thread(new Runnable() {
                    @Override
                    public void run() {
                        try {
                            TimeUnit.SECONDS.sleep(360);
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                        }
                    }
                });
                thread.start();
                System.out.println(count);
            }

        } catch (Error e) {
            e.printStackTrace();
        }
    }
}

And the OS info:

  • Model Name: MacBook Pro
  • Model Identifier: MacBookPro11,4
  • Processor Name: Intel Core i7
  • Processor Speed: 2.2 GHz
  • Number of Processors: 1
  • Total Number of Cores: 4
  • L2 Cache (per Core): 256 KB
  • L3 Cache: 6 MB
  • Memory: 16 GB

The java version:

➤ java -version                                                                                                                                                          
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Dynamic Code Evolution 64-Bit Server VM (build 25.71-b01-dcevmlight-1, mixed mode)

The result: enter image description here

The ulimit -a: enter image description here

The sysctl kern.num_threads:

kern.num_threads: 10240
zhumengzhu
  • 698
  • 5
  • 22
  • 1
    It sounds like you're bumping into the `kern.num_taskthreads` limit rather than the overall thread limit, which limits the number of threads for an individual process. This value is *not* a tunable, without effort. – Anya Shenanigans May 24 '16 at 04:45
  • 1
    By design whatever limits there are will be a platform-specific implementation detail. Whatever numbers or answers you get, don't expect them to be stable across platforms or Java releases. – dimo414 May 24 '16 at 05:28
  • @Petesh You are right. One more question, did linux have such similar limitation? – zhumengzhu May 24 '16 at 07:24
  • 2
    Linux doesn't have a per-process thread limit per-se. The max number of processes would effectively set a thread limit for the user; and the `/proc/sys/kernel/threads-max` is the overall limit. – Anya Shenanigans May 24 '16 at 07:32
  • @Petesh I just made another test with VM args, `-XX:ThreadStackSize=4g`, and got the error when I created 24 threads; But when I use `-Xss4g`, then I got the error when I created 2032 threads. Do you have any ideas about the differences? – zhumengzhu May 24 '16 at 07:45

2 Answers2

4

All this stuff is OS specific - in the case of OSX, you've got a per-process thread limit that can't be exceeded from the sysctl kern.num_taskthreads. The limit in number of threads that you created and the overhead of VM created threads seems to indicate that you're reaching that limit.

The difference between -XX:ThreadStackSize and -Xss<size> is a bit odd. In this case I'm basing my analysis on the OSX oracle java vm (you're indicating that you're running with a different VM).

-Xss sets the stack size to that number of bytes. The variable storing it divides it by 1024. However because of the way it calculates it the value ends up as a meaningless value (64bit jvm, checked on linux and osx) - this is some vonderfully bad overflow math:

for i in {1..8}; do echo "${i}G:"; java -Xss${i}g -XX:+PrintFlagsFinal -version 2>&1 | grep ' ThreadStack'; done
1G:
     intx ThreadStackSize                          := 1048576                             {pd product}
2G:
     intx ThreadStackSize                          := 18014398507384832                    {pd product}
3G:
     intx ThreadStackSize                          := 18014398508433408                    {pd product}
4G:
     intx ThreadStackSize                          := 0                                   {pd product}
5G:
     intx ThreadStackSize                          := 1048576                             {pd product}
6G:
     intx ThreadStackSize                          := 18014398507384832                    {pd product}
7G:
     intx ThreadStackSize                          := 18014398508433408                    {pd product}
8G:
     intx ThreadStackSize                          := 0                                   {pd product}

When we compare this with -XX:ThreadStackSize we have a different picture:

Firstly, these values are scaled by a factor of 1024 - i.e. all values requested are actually a number of KB for the stack size.

This means that -XX:ThreadstackSize needs to be specified in a factor of 1024 down from the values from -Xss. The fact that you were only able to create a fraction of the number of threads, and the virtual memory size of the process makes this obvious (taken from the vmmap output of the process):

Stack                  0000000800004000-0000040800000000 [  4.0T] rw-/rwx SM=NUL  thread 23
Stack                  0000040800000000-0000040800003000 [   12K] rw-/rwx SM=PRV  thread 23

4TB per stack? That's going to hurt (this is what you'd previously asked for):

Once we adjust it down by a factor of 1024, we get the same number of threads in the second run - you can see these numbers far more clearly in the output and they linearly scale with the requested size:

for i in {1..8}; do echo "${i}G:"; java -XX:ThreadStackSize=${i}m -XX:+PrintFlagsFinal -version 2>&1 | grep ' ThreadStack'; done
1G:
     intx ThreadStackSize                          := 1048576                             {pd product}
2G:
     intx ThreadStackSize                          := 2097152                             {pd product}
3G:
     intx ThreadStackSize                          := 3145728                             {pd product}
4G:
     intx ThreadStackSize                          := 4194304                             {pd product}
5G:
     intx ThreadStackSize                          := 5242880                             {pd product}
6G:
     intx ThreadStackSize                          := 6291456                             {pd product}
7G:
     intx ThreadStackSize                          := 7340032                             {pd product}
8G:
     intx ThreadStackSize                          := 8388608                             {pd product}

So, it looks like using -Xss<size> is really only useful when you're looking for a stacksize of < 1GB; and if you're looking for a stacksize of > 1GB then you can specify it explicitly with -XX:ThreadStackSize.

Figuring out the overflow. The code that parses the Xss option:

julong long_ThreadStackSize = 0;
ArgsRange errcode = parse_memory_size(tail, &long_ThreadStackSize, 1000);

Then in an act of stellar muppetry it does:

FLAG_SET_CMDLINE(intx, ThreadStackSize,
                          round_to((int)long_ThreadStackSize, K) / K);

i.e. downcasts the long to an int, which it then passes to round_to. This takes a Register value, which is a 64bit value on the 64bit VM. So from what I can tell is that it the value you start with is:

0x80000000

Gets sign extended to:

0xFFFFFFFF80000000

This gets divided by 1024 (0x400):-

0x3FFFFFFFE00000 == 18,014,398,507,384,832

so you can see where the 2GB value in the prior script comes from.

I've logged a bug. The change needed in the source is rather than (int)long_ThreadStackSize it should be (Register)long_ThreadStackSize to keep the calculation correct.

Anya Shenanigans
  • 91,618
  • 3
  • 107
  • 122
  • I used the `HotSpot VM` too, just with a [DCE VM](http://ssw.jku.at/dcevm/) installed. I ran your code and got the same result. But I still couldn't understand it. It seems some overflow happened. I have tried to read the source code referred to by the mail list: [What the difference between -Xss and -XX:ThreadStackSize is?](http://mail.openjdk.java.net/pipermail/hotspot-dev/2011-June/004272.html). And I didn't find out where the overflow happened. – zhumengzhu May 30 '16 at 02:25
  • 1
    I added a couple of paragraphs on how the overflow comes about - it's a sign extension bug causes by casting the long value into an int, and then having sign-extension turn it into a huge value. – Anya Shenanigans May 30 '16 at 07:19
  • I am not really familiar with `C++`, so can you tell me where could I find the definition of `round_to` function? It seems be a very obvious bug in the code... – zhumengzhu Jun 01 '16 at 08:09
  • And what the type `Register ` means? Is it an unsigned long value? – zhumengzhu Jun 01 '16 at 08:45
  • 1
    It's [defined in this file](http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/file/ae5624088d86/src/cpu/x86/vm/macroAssembler_x86.cpp#l3816) for x86 platforms. The class `Register` corresponds to a 32bit value when run in 32bit code, and a 64bit value when run in 64bit code. The bug is in the calling of `round_to` - the explicit cast to `int` followed by `C++`'s sign extension when calling the routine to create a `Register` is what mangles the value and that's got nothing to do with the actual code of `round_to`. – Anya Shenanigans Jun 01 '16 at 09:00
  • Thanks for your explanation. I understand if an `int` cast to `unsigned long` value, the sign extension would happen. But how is it possible? I know that if I called a method with an `int` value as parameter who's definition is `void round_to(long num)`, the `int` value will implicit cast to `long` type first in Java. But these won't happen between a primitive type and a custom type. If `Register` is a user defined class, how could implicit cast happen? Is this a different between `Java` and `C++`? – zhumengzhu Jun 01 '16 at 10:15
  • No, an `int` cast to a `signed long` would cause the sign extension. You can define constructors that will convert from one data type to the class in question in `C++`. In this case, the data type that's being used is not `unsigned long` but `intptr_t`, a signed equivalent to the size of a pointer. I've made a little C++ example to illustrate the sign extension - http://cpp.sh/8z3id note that sign extension only applies when extending a signed value into a signed data type, if it's going to be unsigned then you don't get sign extension. – Anya Shenanigans Jun 01 '16 at 10:58
  • You are right, I made a mistake here. So after the sign extension happened, the value `0x80000000` became `0xFFFFFFFF80000000` which was still a signed value. Then when it was divided by 1024, the result `0x3FFFFFFFE00000` was a signed value either. But at last, we got an positive value `18,014,398,507,384,832`. How this happened? Was there another cast here? – zhumengzhu Jun 01 '16 at 15:40
  • Because the top bit of the final value was not `1`, it was interpreted as a really big positive number. If the top bit of the final value was a `1`, then it would be interpreted as a negative number. – Anya Shenanigans Jun 01 '16 at 15:54
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/113592/discussion-between-little-pig-and-petesh). – zhumengzhu Jun 02 '16 at 07:36
1

I test it on my linux jvm 1.8.0_92, it's same as what you said, and I find this:

What is the difference between -Xss and -XX:ThreadStackSize?

And the oracle page:

http://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html

it says:

The following examples set the thread stack size to 1024 KB in different units:

   -Xss1m
   -Xss1024k
   -Xss1048576

This option is equivalent to -XX:ThreadStackSize

Community
  • 1
  • 1
suwey
  • 33
  • 5
  • Yes it said so, but the result is different. That's strange. – zhumengzhu May 24 '16 at 08:47
  • Yes, the article about `difference between ...` said jvm on windows is ok – suwey May 24 '16 at 09:04
  • @little.Pig I tried Xss2g and XX:ThreadStackSize=2g in linux, both could not create the vm, but the first shows `intx ThreadStackSize := 18014398507384832` and the other shows ` intx ThreadStackSize := 2147483648`. It's more complicated than I think. – suwey May 30 '16 at 05:27
  • On my 64bit linux VM, I could create the JVM with `-Xss10239m` but not with `-Xss10240m`; and I could create the JVM with `-XX:ThreadStackSize=6m` but not with `-XX:ThreadStackSize=7m`. The `ulimit -a` is `stack size (kbytes, -s) 10240`. It was odd. – zhumengzhu Jun 01 '16 at 07:35