1

I'm first time with C++ and I have a question about storing the "count" in a variable and calling .size().

Why did I have this question? Because I have an internal contradiction!

In all other languages ​​that I have used, be it php, typescript or python, I have always written the count to a variable before iteration, because in all these languages ​​it is considered good practice, php for example:

$count = count($arr);
for($i = 0; $i < $count; $i++) ...

But! I've looked at different sources of various open source c++ projects and you guys who use c++ seem to always do like this:

for (std::size_t i = 0; i < bt.size() && i < BACKTRACE_HASHED_LENGTH; i++) {
    h = h * 0x4372897893428797lu + reinterpret_cast<std::uintptr_t>(bt[i]);
}

So, is it okay for C++ to always use .size()? Or how do you do it?

Chris
  • 26,361
  • 5
  • 21
  • 42
volama1699
  • 53
  • 4
  • 2
    Personally I still store it to a separate variable but it is less necessary because the compiler can inline the function call of `size()` (at least for standard containers) and figure out for itself that it can reuse the value. This assumes that the compiler can prove nobody else could have manipulated the value within the loop. This assumption usually holds as long as a) the container is a local variable and/or b) there are no (non-inlined) function-calls in the loop body – Homer512 Jul 27 '23 at 06:28
  • What "total" do you mean? What each loop should do depend very much on what the loop is supposed to be doing. For example, if you want to iterate over some elements of a vector use [a range `for` loop](https://en.cppreference.com/w/cpp/language/range-for), if you need more control use a "normal" `for` loop with iterators, if you need to get indexes when use integers and the container size. – Some programmer dude Jul 27 '23 at 06:29
  • 1
    Many C++ coding practices may be trying to be a little more on the terse side, so doing the function call inside the for() header is more common to avoid that extra line. In this case it really comes down to preference, the resulting binary is most likely going to be exactly the same. If you are looping over an iterable container of some sort (std::vector, etc), there's also the for-each type syntax: `for(auto bla : foo())` – nick Jul 27 '23 at 06:31
  • By the way, as someone who is doing their "first time with C++", the C++ example you show isn't what I would call very suitable. As a total beginner with C++ I really recommend you invest in [some good books](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list) to read, and do their exercises. And whatever you do, stay away from so-called "competition" or "judge" sites, they are *not* any kind of learning or teaching resource, especially for a beginner who needs to learn the basics. – Some programmer dude Jul 27 '23 at 06:45
  • 1
    BTW, this is a simple way of storing the variable without using an extra line: `for(size_t i = 0, n = bt.size(); i < n; ++i)` You can initialize two of the same variable type in one expression – Homer512 Jul 27 '23 at 06:47

2 Answers2

4

If the collection is not mutated in the loop (and that way lies madness) then a size member function should return a constant, and a smart enough compiler will figure this out and optimize away the performance concern you have.

However, a modern C++ programmer would likely look for an iterator based approach rather than what you've shown, and thus sidestep the whole issue.

#include <iostream>
#include <vector>

int main() {
    std::vector<int> vec { 1, 2, 3, 4, 5, 6, 7 };

    for (std::size_t i = 0; i < vec.size(); i++) {
        std::cout << vec[i] << std::endl;
    }
}

As opposed to:

#include <iostream>
#include <vector>

int main() {
    std::vector<int> vec { 1, 2, 3, 4, 5, 6, 7 };

    for (auto &x : vec) {
        std::cout << x << std::endl;
    }
}

Or going another step along this line.

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>

int main() {
    std::vector<int> vec { 1, 2, 3, 4, 5, 6, 7 };

    std::copy(
      vec.cbegin(), vec.cend(),
      std::ostream_iterator<int>(std::cout, "\n")
    );
}
Chris
  • 26,361
  • 5
  • 21
  • 42
4

Yes, it's ok and efficient to call .size() within a for loop condition, because modern C++ compilers can cache these .size() calls, as long as the compiler can proof that the size does not change in the loop body.

Example: gcc compiler produces identical machine-level code for these two:

int sum = 0;
for (size_t i = 0; i < vec.size() ; i ++) { sum += vec[i]; }

and

int sum = 0;
auto size = vec.size();
for (size_t i = 0; i < size ; i ++ ) { sum += vec[i]; }

Reference: https://godbolt.org/z/aM4PbWc1x

chrysante
  • 2,328
  • 4
  • 24
Olli
  • 165
  • 1
  • 5
  • [Well, as soon as you escape a pointer to `vec` in the loop body, it will not be optimized anymore.](https://godbolt.org/z/YbaMYvver) So if you really care about `size()` not being recalculated you should cache it yourself. – chrysante Jul 27 '23 at 08:00
  • @chrysante you made a curious example above. The function that caches the vector size value is overall larger than the non-caching version (30 vs 38 instructions), yet the looping piece is shorter (6 vs 10 instructions). Thus one version is likely faster for short loops, the other one for longer loops. Apart from critical hot-spots routines this kind of differences are however insignificant. – Olli Jul 28 '23 at 08:28
  • The point I was trying to make, is that the function that caches `vec.size()` has one less load instruction in the loop body. The non-caching function has to reload the size of the vector from memory on every iteration whereas the other can store it in a register. And of course it has to reload the size because `use_vector` could cast `const` away and change the size. I agree that most of the time the performance difference won't even be measurable, but you can't generalize the way you did by saying "modern C++ compilers can optimize these .size() calls". – chrysante Jul 28 '23 at 08:44
  • There is a reason why range based `for`-loops cache the end iterator instead of recalculating it. LLVM even has [a rule in their coding standards about this](https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop) :-) – chrysante Jul 28 '23 at 08:45