3

I have a very basic 2D C++ container that relies on a single dimensional container (std::array or std::vector) and "advanced" indexing. This container is not expected to store many elements, not even 500. What would be the best approach in accessing its elements?

At first I was using int for indexing the elements directly while iterating the container. However, after reading a bit, posts such as this one made me switch to using std::size_t instead. (The switch was primarily about picking up good habits and not due to the requirements of my container. But other sources made me question whether it would be actually a good habit.)

As a result I had to readjust several for() loops to avoid underflow realated errors, e.g. when checking the values of elements next to one another, or when iteratin the elements backward in a row or a column. These changes in turn decreased the readability and made the indexing more error prone. (MyIiexperience is definitely a factor in this matter.)

What should I do?

  1. stick with std::size_t
  2. keep using int(while also restricting the size of my 2D storage to avoid unlikely overflows)
  3. stick with range or iterator based loops by creating them for columns and rows (suggestions and links are welcome)
  4. something else

Thank you in advance!

P.S.: Even C++20 functionality is welcomed!

Dudly01
  • 444
  • 3
  • 13
  • Assuming you wish 'conventional' row or col major order, consider a inventing a convenience function to compute the single dimension vector<> index from 2d (row,col) coordinates. Maybe something like "size_t to_1d_Indx (size_t row, size_t col) { return (row * maxCol) + col; }" (I'm sure you can handle 2d indexes using int, and size_t for 1d index.) – 2785528 Aug 28 '20 at 00:54

2 Answers2

3

std::ptrdiff_t is a signed std::size_t.

I use that. I have heard arguments that is what the standard should have used, at the cost of a single bit of maximum size.

Writing a stride-friendly iterator is a bit of a pain. Here is a sketch:

template<class T, class Stride=std::integral_constant<std::size_t, 1>>
struct stride_iterator {
  using difference_type = std::ptrdiff_t;
  using value_type = T;
  using reference= T&;
private:
  T* dataptr = nullptr;
  Stride datastride = {};
  T* data() const { return dataptr; }
  T*& data() { return dataptr; }
  Stride stride() const { return datastride; }
public:
  explicit stride_iterator( T* ptr, Stride s ):
    dataptr(ptr),
    datastride{ std::move(s) }
  {}
  explicit stride_iterator( T* ptr ):
    dataptr(ptr)
  {}
  stride_iterator():stride_iterator(nullptr) {}
  stride_iterator(stride_iterator const&)=default;
  stride_iterator& operator=(stride_iterator const&)& =default;
  stride_iterator(stride_iterator &&)=default;
  stride_iterator& operator=(stride_iterator &&)& =default;
  
  T& operator*() const { return *data(); }
  T* operator->() const { return data(); }
  T& operator[](std::ptrdiff_t n) const {
    return *(*this+n);
  }
  stride_iterator& operator+=( std::ptrdiff_t n )& {
    data() += (n*stride());
    return *this;
  }
  stride_iterator& operator-=( std::ptrdiff_t n )& {
    data() -= (n*stride());
    return *this;
  }
  friend stride_iterator operator+( std::ptrdiff_t rhs, stride_iterator lhs ) {
    return std::move(lhs)+rhs;
  }
  friend stride_iterator operator+( stride_iterator lhs, std::ptrdiff_t rhs ) {
    lhs += rhs; 
    return lhs;
  }
  friend stride_iterator operator-( stride_iterator lhs, std::ptrdiff_t rhs ) {
    lhs += rhs; 
    return lhs;
  }
  stride_iterator& operator++() {
    *this += 1;
    return *this;
  }
  stride_iterator& operator--() {
    *this -= 1;
    return *this;
  }
  stride_iterator operator++(int) {
    auto r = *this;
    ++*this;
    return r;
  }
  stride_iterator operator--(int) {
    auto r = *this;
    --*this;
    return r;
  }
  friend std::ptrdiff_t operator-( stride_iterator const& lhs, stride_iterator const& rhs ) {
    return (lhs.data()-rhs.data())/stride();
  }
  friend bool operator<( stride_iterator const& lhs, stride_iterator const& rhs ) {
    return lhs.data() < rhs.data();
  }
  friend bool operator<=( stride_iterator const& lhs, stride_iterator const& rhs ) {
    return lhs.data() <= rhs.data();
  }
  friend bool operator>( stride_iterator const& lhs, stride_iterator const& rhs ) {
    return lhs.data() > rhs.data();
  }
  friend bool operator>=( stride_iterator const& lhs, stride_iterator const& rhs ) {
    return lhs.data() >= rhs.data();
  }
  friend bool operator==( stride_iterator const& lhs, stride_iterator const& rhs ) {
    return lhs.data() == rhs.data();
  }
  friend bool operator!=( stride_iterator const& lhs, stride_iterator const& rhs ) {
    return lhs.data() != rhs.data();
  }
};

template<class It>
struct range {
    It b, e;
    It begin() const { return b; }
    It end() const { return e; }
    decltype(auto) operator[]( std::ptrdiff_t i ) const {
        return b[i];
    }
    std::size_t size() const { return end()-begin(); }
    bool empty() const { return end()==begin(); }
};


struct toy_matrix {
    std::ptrdiff_t width = 1;
    std::ptrdiff_t height = 1;
    std::vector<int> data = std::vector<int>( 1 );
    
    toy_matrix() = default;
    toy_matrix( std::size_t w, std::size_t h ):width(w),height(h), data(w*h) {}
    toy_matrix( std::size_t w, std::size_t h, std::initializer_list<int> il ):width(w),height(h), data(il) {
        data.resize(w*h);
    }
    
    range<stride_iterator<int>> row( std::ptrdiff_t i ) {
        int* ptr = data.data();
        ptr += height * i;
        return { stride_iterator<int>{ ptr }, stride_iterator<int>{ ptr + width } };
    }
    range<stride_iterator<int, std::ptrdiff_t>> column( std::ptrdiff_t i ) {
        int* ptr = data.data();
        ptr += i;
        return { stride_iterator<int, std::ptrdiff_t>{ ptr, width }, stride_iterator<int, std::ptrdiff_t>{ ptr + height * width, width } };
    }
    range<stride_iterator<int>> operator[]( std::ptrdiff_t i ) {
        return row(i);
    }
    int& operator()( std::ptrdiff_t x, std::ptrdiff_t y ) {
        return (*this)[x][y];
    }
};

Test code:

toy_matrix m{ 5, 4, {
    1,2,3,4,5,
    10,20,30,40,50,
    100,200,300,400,500,
    1000,2000,3000,4000,5000,
}};

for (auto x : m.column(0)) {
    std::cout << x << "\n";
}
for (auto x : m.column(1)) {
    std::cout << x << "\n";
}
for (auto x : m.column(2)) {
    std::cout << x << "\n";
}

output:

1
10
100
1000
2
20
200
2000
3
30
300
3000

Live example.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Re: `cost of a single bit ` - as they say, two billions here, two billions there and soon you start talking about serious money... Or was it trillions? :) – Vlad Feinstein Aug 28 '20 at 00:33
  • 1
    @VladFeinstein The single bit only matters for single-byte data of application-breaking size on 32 bit and under platforms. If you are using more than half of your address space in a single std::vector, maybe you should be using a special purpose container, and let everyone else not deal with the pain of unsigned indexes. – Yakk - Adam Nevraumont Aug 28 '20 at 01:46
  • Your sample code is very much appreciated! This (after you edit) was exactly a functionality I wanted to implement but was lacking knowledge on how to. – Dudly01 Aug 28 '20 at 17:33
  • @Yakk-AdamNevraumont "std::ptrdiff_t [...] I use that", in the event that the std::ssize of C++20 is not available, could you elaborate on how would you use it to iterate the indexes? Would it be something like `for(std::ptrdiff_t i = 0; i != static_cast(container.size()); ++i){ ... }` ? – Dudly01 Aug 28 '20 at 18:12
  • 1
    @Dudly01 Myself, I write an indexing iterator and a factory function from range-like things that converts the size to a signed indexing iterator range and do a range-for loop over that, so I get `for (auto i : indexes_of(container))`. My range also has `.without_back(n=1)`, so I can do `for (auto i: indexes_of(container).without_back())`. What you should do, I don't know. – Yakk - Adam Nevraumont Aug 28 '20 at 18:20
1

Stroustrup prefers signed integer types unless one has a good reason to use an unsigned one. Alexander Stepanov, the creator of the STL, preferred size_t, unsigned by definition, as the type to store containers' size. So the difference is very subtle.

My personal opinion:

  • If you are a part of an organization, use their coding standards
  • Always avoid old-style for loops and prefer range-based for loops (for (auto x: vec) etc.). With this, there's no index, no room for a bug.
  • If you need the index inside the loop, prefer int and use C++20's std::ssize (signed size of a container).

Rationale:

  • std::ptrdiff_t is too lengthy to write :-) and you almost never have containers with more than 2 billion elements.
  • int in expressions is more natural and less error-prone
  • using just one unsigned variable inside a loop (or in any expression) often forces one to cast many other variables into unsigned type just to silence compiler warnings. The code gets ugly, very ugly.
  • signed integers are easier to optimize (as explained in the link included in the question)

That said, for years I used size_t (organization) or careless (int)v.size() (private code). std::ssize should have been added to the library ages ago.

zkoza
  • 2,644
  • 3
  • 16
  • 24
  • Interesting. Didn't know about `ssize()` While I generally avoid casting I have no problem with int(v.size()) because it is clear and unambiguous. I use c++11 style casts in the few other places required, typically external library interfaces, because they're a bit less error prone and easier to search for when refactoring. – doug Aug 28 '20 at 03:03