In terms of runtime performance, how expensive it is to cast int to short in Java? There may be thousands of such casting, hence I wonder if it would impact the performance or not. Thanks.
7 Answers
No. It won't impact performances. It is a single simple operation. When you want to analize performances of a software you better focus on the computational cost of algorithmic operations based on the size of input.

- 38,762
- 28
- 132
- 190
-
@bvdb:unfortunately I cannot even delete my answer because it's marked as right. I should have posted a comment instead. – Heisenbug Aug 01 '15 at 22:01
-
I'm sorry, I'll remove my overly agressive comment. :) Maybe I had a bad day. – bvdb Aug 02 '15 at 09:02
Leaving philosophical arguments and excuses aside ...
First of all short
is a special data type. Java doesn't really like short
. Java actually really loves int
because the JVM stack uses 32 bit registers. And a short
is a 16-bit datatype (int
= 32-bit).
Because of the 32-bit structure, whenever java moves a short to the stack it is automatically converted to an integer. So, the first thing to wonder about is really, do I want to use short
's at all in java ? They come indeed with a cost. That's why you will rarely ever see any short
datatype usages in jdk sourcecode.
The JVM uses the i2s
operation when converting an integer to a short. The exact cost will depend on what JVM you are using and your hardware.
You can find some of the stats in this paper, But the i2s
is not listed unfortunately. It should take less than 20ns though.

- 22,839
- 10
- 110
- 123
You can neglect the cost of that cast. You won't notice thousands of such casts.

- 10,862
- 2
- 33
- 27
Below is a test for the long case, but can easily be adapted to short. Casting the int to a long is in this example about 5% slower compared to just passing in the long.
Interestingly calling the method using an implicit cast is slower thank casting it yourself.
With no cast : PT0.141517096S With cast : PT0.148024511S With implicit cast: PT0.159904349S
@Test
public void testPerformance(){
long sum =0L;
long timeCallWithImplicitCast =0;
long timeCallWithLong =0;
long timeCallWithCast =0;
for(int j=0;j<2;j++) {//First run warm-up
timeCallWithCast=0;
timeCallWithLong=0;
timeCallWithImplicitCast=0;
for (int i = 0; i < 10_000_000; i++) {
long s1 = System.nanoTime();
sum += shift(i);//Call with int implicit cast
long e1 = System.nanoTime();
timeCallWithImplicitCast += (e1 - s1);
}
for (int i = 0; i < 10_000_000; i++) {
long s3 = System.nanoTime();
sum += shift((long) i);//Call with cast long
long e3 = System.nanoTime();
timeCallWithCast += (e3 - s3);
}
for (int i = 0; i < 10_000_000; i++) {
long l = (long) i;
long s2 = System.nanoTime();
sum += shift(l);//Call with long
long e2 = System.nanoTime();
timeCallWithLong += (e2 - s2);
}
}
System.out.println("With no cast : "+ Duration.ofNanos(timeCallWithLong));
System.out.println("With cast : "+Duration.ofNanos(timeCallWithCast));
System.out.println("With implicit cast: "+Duration.ofNanos(timeCallWithImplicitCast));
}
protected long shift(long index){
return index << 4;
}

- 1,799
- 16
- 19
-
-
`System.nanoTime();` has *huge* overhead compared to a single `index << 4;`. And optimization should be able to prove that `(long)i` doesn't actually take any extra work, since it can already have `int i` zero-extended to 64-bit in a register. Microbenchmarking is hard, and the cost (if any) will vary significantly by use-case. Good try, but I don't expect this tells us much. (See also [Idiomatic way of performance evaluation?](https://stackoverflow.com/q/60291987) re: things that are way shorter than the cost of reading the clock, or the CPU's out-of-order exec window.) – Peter Cordes Jul 12 '22 at 01:40
I think the safe practice is to not worry about performance until you have a performance problem. And when you do have a performance problem, it's extremely likely that in most business applications the majority of an applications sluggishness can be accounted for in its interactions with the disk and/or network. I think it's very unlikely that micro optimizations like this will have much of an impact on your performance.

- 7,187
- 5
- 32
- 53
-
2I disagree, its good to keep in mind performance. I have had to fix far too much code because of devs who did take performance into account, or because of bugs that were caused for the same reason. – John Kane May 12 '11 at 19:46
-
There are simple and safe ways to keep performance good without stepping into the realm of optimisation. e.g. not creating temporary objects within a loop without it being strictly necessary, avoiding autoboxing and so forth. Optimisation should definitely be the last thing and avoided completely if possible. It's better for the code to be functional, maintainable and correct than optimal and broken. – locka May 12 '11 at 19:57
-
2@John - unless you have a pretty strong understanding of how the jvm is going to optimize your code at runtime I'd argue that most micro optimizations are, at best, guesses. – DaveH May 12 '11 at 20:07
-
1I completely agree. You shouldn't try to make micro optimizations. But, you should keep performance in mind. – John Kane May 12 '11 at 20:15
Why do you need to do this? I do not think that it would effect performance that much, but keep in mind the range of the data type you need:
int: -2,147,483,648 to 2,147,483,647
short: -32,768 to 32,767

- 4,383
- 1
- 24
- 42
The cast is small compared with loading a int from memory or storing a short. In any case they all cost about 2 nano-second. If you do thousands of these it will cost a few micro-seconds.

- 525,659
- 79
- 751
- 1,130
-
1On real CPUs, separate [instructions don't have a time cost that adds linearly](https://stackoverflow.com/a/51622129/224132). Superscalar out-of-order exec means that performance has 3 dimensions: front-end cost (# of instructions or uops), back-end execution-port bottlenecks (e.g. only 2 loads per clock on many CPUs), and latency bottlenecks of dependency chains. A load that hits in L1d cache often has 5-cycle latency (from address to data being ready) on modern CPUs, including an x86-64 `movzx eax, word [rdi]` zero-extending 16 -> 32-bit load. But in those 5 cycles, 10 loads can start. – Peter Cordes Jul 12 '22 at 01:33
-
1@petercordes I agree it's non linear, it might be nothing or it might be more for incidental reasons..e.g. more byte code in a method can result in a method not being inlined – Peter Lawrey Jul 17 '22 at 06:55