1

To what extent can a JIT replace platform independent code with processor-specific machine instructions?

For example, the x86 instruction set includes the BSWAP instruction to reverse a 32-bit integer's byte order. In Java the Integer.reverseBytes() method is implemented using multiple bitwise masks and shifts, even though in x86 native code it could be implemented in a single instruction using BSWAP. Are JITs (or static compilers for that matter) able to make the change automatically or is it too complex or not worth it due to a poor speed/time tradeoff?

(I know that this is in most cases a micro-optimisation, but I'm interested none the less.)

Gnat
  • 2,861
  • 1
  • 21
  • 30
  • 1
    *Can* is rarely the problem with such low-level optimizations. The question usually is if it's considers useful enough to be actually done. –  Sep 13 '11 at 15:54

1 Answers1

1

For this case, yes, the hotspot server compiler could do this optimization. The reverseBytes() methods are registered as vmIntrinsics in hotspot. When jit compiler compile these methods, it will generate a special IR node, not compile the whole method. And this node will be translated into 'bswap' in x86. see src/share/vm/opto/library_call.cpp

//----------------------------   inline_reverseBytes_int/long/char/short-------------------
// inline Integer.reverseBytes(int)
// inline Long.reverseBytes(long)
// inline Character.reverseBytes(char)
// inline Short.reverseBytes(short)
bool LibraryCallKit::inline_reverseBytes(vmIntrinsics::ID id) {
  assert(id == vmIntrinsics::_reverseBytes_i || id == vmIntrinsics::_reverseBytes_l ||
         id == vmIntrinsics::_reverseBytes_c || id == vmIntrinsics::_reverseBytes_s,
         "not reverse Bytes");
  if (id == vmIntrinsics::_reverseBytes_i && !Matcher::has_match_rule(Op_ReverseBytesI))  return false;
  if (id == vmIntrinsics::_reverseBytes_l && !Matcher::has_match_rule(Op_ReverseBytesL))  return false;
  if (id == vmIntrinsics::_reverseBytes_c && !Matcher::has_match_rule(Op_ReverseBytesUS)) return false;
  if (id == vmIntrinsics::_reverseBytes_s && !Matcher::has_match_rule(Op_ReverseBytesS))  return false;
  _sp += arg_size();        // restore stack pointer
  switch (id) {
  case vmIntrinsics::_reverseBytes_i:
    push(_gvn.transform(new (C, 2) ReverseBytesINode(0, pop())));
    break;
  case vmIntrinsics::_reverseBytes_l:
    push_pair(_gvn.transform(new (C, 2) ReverseBytesLNode(0,pop_pair())));
    break;
  case vmIntrinsics::_reverseBytes_c:
    push(_gvn.transform(new (C, 2) ReverseBytesUSNode(0, pop())));
    break;
  case vmIntrinsics::_reverseBytes_s:
    push(_gvn.transform(new (C, 2) ReverseBytesSNode(0, pop())));
    break;
  default:
;
  }
  return true;
}

and src/cpu/x86/vm/x86_64.ad

instruct bytes_reverse_int(rRegI dst) %{
  match(Set dst (ReverseBytesI dst));

  format %{ "bswapl  $dst" %}
  opcode(0x0F, 0xC8);  /*Opcode 0F /C8 */
  ins_incode( REX_reg(dst), OpcP, opc2_reg(dst) );
  ins_pipe( ialu_reg );
%}
Kuai Wei
  • 86
  • 4
  • Excellent, thanks. So in this example (and undoubtedly others) the optimisation is implemented as a special case for inlining, rather than analysing the bytecode and saying "this calculation is equivalent to BSWAP, so I'll use that". Another good reason to use the libraries when possible instead of rolling your own code. – Gnat Sep 17 '11 at 07:45