UTF-8's decoding algorithm works like this. You do up to 3 conditional tests against the first byte to figure out how many bytes to process, and then you process that number of bytes into a codepoint.
UTF-16's encoding algorithm works by taking the code point and checking to see if it is larger than 0xFFFF. If so, then you encode it into 2 16-bit surrogate pairs; otherwise, you encode it into a single 16-bit code unit.
Here's the thing though. Every codepoint larger than 0xFFFF is encoded in UTF-8 by 4 code units, and every codepoint 0xFFFF or smaller is encoded by 3 code units or less. Therefore, if you did UTF-8 decoding to produce the codepoint... you don't have to do the conditional test in the UTF-16 encoding algorithm. Based on how you decoded the UTF-8 sequence, you already know if the codepoint needs 1 16-bit code unit or two.
Therefore, in theory, a full UTF-8->utf-16 hand-coded algorithm could involve one less conditional test than using a direct codepoint intermediate. But really, that's the only difference. Even for 4-byte UTF-8 sequences, you have to extract the UTF-8 value into a full 32-bit codepoint before you can do the surrogate pair encoding. So the only real efficiency gain possible is the lack of the condition.
For UTF-16->UTF-8, you know that any surrogate pair encoding requires 4 bytes in UTF-8, and any non-surrogate pair encoding requires 3 or less. And you have to do that test before decoding UTF-16 anyway. But you still basically have to do all of the work to convert the UTF-16 to a codepoint before the UTF-8 encoder can do its job (even if that work is nothing, as is the case for non-surrogate pairs). So again, the only efficiency gain is from losing one conditional test.
These sound like micro-optimizations. If you do a lot of such conversions, and they're performance-critical, it might be worthwhile to hand-code a converter. Maybe.