Is there a performance hit when implicitly casting the returning object of a function?

Question

I am hoping to clear up some confusion here. This seems like a simple one but I can't find a clear answer.

Should I always use the explicit return type of a function unless I have a good reason not to (eg. a need to conserve memory), even if I know the range of what can be returned will fit in a smaller type?

Using the following function as an example:

size_t std::string::find_last_of (char c, size_t pos = npos) const;

Is this:

std::string s("woof.woof.meo.w");
size_t result = s.find_last_of('.');

Less, equally, or more efficient than this:

std::string s("woof.woof.meo.w");
unsigned char result = s.find_last_of('.');

I think a size_t is constructed initially no matter what type the result is copied to. So there is no performance gain when implicitly casting to a smaller type. But is there a performance hit? What happens with the redundant bits from the larger type?

Thanks for your time and I appreciate your guidance.

Welcome to Stack Overflow! Why will you do that? A smaller type rarely means better performance. — L. F., Oct 17 '19 at 10:17
They are compile time operations, there would be no performance hit unless you are using dynamic_cast. — traintraveler, Oct 17 '19 at 10:19
I agree with @traintraveler and would add that you can use `auto` so you don't need to worry about this — NutCracker, Oct 17 '19 at 10:19
@traintraveler I am not 100% sure about zero operations at runtime. unsigned overflow wraps around, does this not need addtional operations? — 463035818_is_not_an_ai, Oct 17 '19 at 10:22
It could cause additional instructions to be added, but I don't think that affects performance. — traintraveler, Oct 17 '19 at 10:28
Do not micro-optimize! This kind of efforts are pointless and only can do a harm. Remember that in `C` and `C++` there is "As if rule", so compiler can optimize for you lots of stuff and you do not have to write some strange contraptions. Keep your code simple and think about algorithm complexity. Use `auto`. — Marek R, Oct 17 '19 at 10:39
@formerlyknownas_463035818 Really depends on the hardware. On x86 you can access e.g. the low byte of a 64 bit register directly. But this is all really pointless to discuss without a very precise optimization-need context. — Max Langhof, Oct 17 '19 at 10:56
I would emphasize how minuscule any possible performance drawback here is going to be. This conversion will cost you no more than a fraction of a cycle (and usually it will cost absolutely nothing). Your program would have to run _billions_ of times to make back the time you spent even _thinking_ about this performance difference. Unless you have profiling results that strongly suggest a bottleneck on that line, optimizing it is strictly a waste of time. That's leaving aside correctness questions (addressed in the answers). — Max Langhof, Oct 17 '19 at 11:00
@MaxLanghof: One has to be careful, though. Although you certainly _can_ access the low byte of a register really easily and seemingly for free, it is by no means obvious when it's an advantage and when it is _actually_ free. For example, there's false dependencies that may come in your way, or the rather complicated rules of store forwarding, and whatnot. And oh the joy, the rules are not only [complex](https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to) and different between manufacturers, but even between submodels. — Damon, Oct 17 '19 at 11:14
@Damon All great points, but this only further underlines that discussing this optimization without a concrete performance problem to solve is way too broad/a waste of time. This is rather something for compilers to figure out in most cases. — Max Langhof, Oct 17 '19 at 11:20
It doesn't really matter for something like this (*unless* you have actually profiled your code and found this to be a *real* performance bottleneck (I doubt it)). I'd just use `const auto result = ...` and move on to more important issues. — Jesper Juhl, Oct 17 '19 at 11:32
Well, it does matter, but for another reason. C++ is designed in a, well, forgive me ranting, pretty stupid way, sometimes. You have types like `size_t` (as in the example) or `ptrdiff_t` or `nullptr_t` which exist for good reason and do very specific things and which are very obviously built-in types and keywords, however _they are not_ built-in types or keywords at all! You have to include a darn header just to use these. But yeah, at least we now have _two different_ keywords for compiletime constants and possible compiletime constants, and still no proper (throw value) exception system. — Damon, Oct 17 '19 at 14:17
@Damon I agree that its a rant, but I dont get the reason for ranting here. Literally anything that makes you use a `size_t` comes with `size_t` already declared. Situations where you explicilty need to include a header to get `size_t` are extremely rare — 463035818_is_not_an_ai, Oct 17 '19 at 19:54

score 2 · Accepted Answer · answered Oct 17 '19 at 10:19

I am not going to answer your question directly.

Correctness is more important than performance. In your example, using unsigned char does not make the code incorrect, but it introduces an implicit assumption: The resulting position is small enough to fit in a unsigned char. Taking this to the extreme, you could write:

std::string s("woof.woof.meo.w");
const unsigned char result = 13;

I am exaggerating to get the point across. If possible the second line should be correct independent of what happens before. A size_t is big enough to hold any size a string can have. This is not true for unsigned char. Using unsigned char implies that you silence or ignore the warning resulting from casting to a smaller type and consequently you would also miss the warning once your string is long enough to make the code fail.

Conclusion: The safety from using the right type for sure outweighs the tiny difference in performance if there would be any (actually there is none).

Exactly. You can get the wrong answer in zero time. – user207421 Oct 17 '19 at 12:35 — user207421, Oct 17 '19 at 12:35

Is there a performance hit when implicitly casting the returning object of a function?

1 Answers1