Pick a random generator that is as fast and as good as you need it to be, and that isn't slowed down to a tiny fraction of its normal speed by thread safety mechanisms. Then pick a method of generating the [1..6] integer distribution that is a fast and as precise as you need it to be.
The fastest simple generator that is of sufficiently high quality to beat standard tests for PRNGs like TestU01 (instead of failing systematically, like the Mersenne Twister) is Sebastiano Vigna's xorshift64*. I'm showing it as C code but Sebastiano has it in Java as well:
uint64_t xorshift64s (int64_t &x)
{
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 2685821657736338717ull;
}
Sebastiano Vigna's site has lots of useful info, links and benchmark results. Including papers, for the mathematically inclined.
At that high resolution you can simply use 1 + xorshift64s(state) % 6
and the bias will be immeasurably small. If that is not fast enough, implement the modulo division by multiplication with the inverse. If that is not fast enough - if you cannot afford two MULs per variate - then it gets tricky and you need to come back here. xorshift1024* (Java) plus some bit trickery for the variate would be an option.
Batching - generating an array full of numbers and processing that, then refilling the array and so on - can unlock some speed reserves. Needlessly wrapping things in classes achieves the opposite.
P.S.: if ThreadLocalRandom and xorshift* are not fast enough for your purposes even with batching then you might be going about things in the wrong way, or you might be doing it in the wrong language. Or both.
P.P.S.: in languages like Java (or C#, or Delphi), abstraction is not free, it has a cost. In Java you also have to reckon with things like mandatory gratuitous array bounds checking, unless you have a compiler that can eliminate those checks. Teasing high performance out of a Java program can get very involved... In C++ you get abstraction and performance for free.