2

Considering c++ (or c++11), where I have some array of data with 2*N integers which represent N pairs. For each even i=0,2,4,6,...,2*N it holds that (data[i],data[i+1]) forms such a pair. Now I want to have a simple way to access these pairs without the need to write loops like:

for(int i=0; i<2*N; i+=2) { ... data[i] ... data[i+1] ... }

So I wrote this:

#include <iostream>

struct Pair {
    int first; int second;
};

int main() {
    int N=5;
    int data[10]= {1,2,4,5,7,8,10,11,13,14};
    Pair *pairs = (Pair *)data;

    for(int i=0; i<N; ++i)
        std::cout << i << ": (" << pairs[i].first << ", " << pairs[i].second << ")" << std::endl;

    return 0;
}

Output:

0: (1, 2)
1: (4, 5)
2: (7, 8)
3: (10, 11)
4: (13, 14)

ideaone: http://ideone.com/DyWUA8

As you can see, I cast the int pointer to a Pair pointer, such that c++ simply handles that my data is twice the size of an int. And I know, because that is how arrays work, that the data array is aligned in pairs of two sizeof(int)'s. However, I am not sure whether I can assume that a Pair is exactly two sizeof(int)'s and whether the member fields first and second are stored in that order (or alignment). Technically, in a worst case setting I can imagine that compiler stores 2 bytes for first, then 4 of second and then 2 of first (given that int's are 4 bytes), and somehow manages this. Of course, that is probably ludicrous, but is it allowed in c++?

Please note, I don't want to copy all the data to a new array and manually convert it to Pairs. Imho, that's an expensive operation for just syntax sugar.

May I assume the alignment of the Pair class? Does the same hold for structs? Are there other ways?

From what I read here ( How is the size of a C++ class determined? ) it is up to the compiler, and not in the language, of c++ how classes are aligned in memory. Does this mean that I am doomed to copy my data or use nasty syntax? Can I somehow force minimal alignment in the c++ language, or would I need compiler switches?

Community
  • 1
  • 1
Herbert
  • 5,279
  • 5
  • 44
  • 69
  • Why don't you just store an array of 5 pairs, instead of an array of 10 ints? – Benjamin Lindley May 05 '14 at 19:21
  • This is how I get the data. I'm using the Matlab mex library which provides me with a 2xN matrix, representing the pairs, as an input argument. – Herbert May 05 '14 at 19:23
  • http://stackoverflow.com/q/7160901/541686 – user541686 May 05 '14 at 19:24
  • This is not a safe conversion. If you use a `char[]` as the underlying data buffer and properly align the type you want to cast it as, it could be. – John Dibling May 05 '14 at 19:31
  • @JohnDibling Are you suggestion this: http://ideone.com/nsHp0s , because the first member of a class must always align with the beginning of the object? – Herbert May 05 '14 at 19:45
  • @Herbert No, that still violates the strict alias rules when you cast the int array to a `Pair` pointer. – Mark B May 05 '14 at 20:13
  • Because sizeof(Pair) is not necessarily equal to sizeof(int)*2, since the Pair class may have padded bytes? – Herbert May 05 '14 at 20:14

3 Answers3

2

What you have done violates the strict aliasing rules and as such causes undefined behavior, in addition to any possible size and alignment issues.

The cleanest solution would be to store the data in logical pairs rather than as flat data, by doing a one-time conversion if needed. Don't worry about the performance of doing the data conversion unless profiling shows you that that is where your execution time is being spent. The clarity of grouping your data into pairs will almost certainly pay off in correctness in the long-term.

Alternately you could create inline functions that abstract away accessing notional "first" and "second" attributes of the flat array data.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • Here, the reason that I worry about performance is not because profiling suggests it, but because I dislike doing things that are not actually needed. It's the same reason why I would not iterate through all possible integer values in order to find the one equal to 72. Note that copying the data would just be to create syntax sugar. – Herbert May 05 '14 at 19:33
  • @Herbert: But the whole motivation behind doing any of what you want to do at all seems to be syntactic sugar. – John Dibling May 05 '14 at 19:37
  • No offense however, I see your point and it is valid: correctness before speed. However, the only suggestion I read in your answer is to wrap class functionality in functions, and it is not the same. For example, that would disallow me to sort in this way: http://ideone.com/pyC8Xl , since the data is not actually interpreted ad an array of pairs by the c++ language. – Herbert May 05 '14 at 19:38
  • @JohnDibling Syntax sugar, but not at the cost of speed. I don't see why I can't have a datatype representing two minimally aligned ints and still have speedy code ;) If c++ can do the pointer-arithmatics, I prefer to let him do it instead of me. – Herbert May 05 '14 at 19:40
  • @Herbert: You can -- just not the way you are trying to do it. You also have to allow yourself to leverage compiler-specific abilities. – John Dibling May 05 '14 at 19:42
2

The alignment of a struct is at least the biggest alignment of its members, but it can be bigger. Also the compiler can add padding between your members, so your code is not safe.

Basiscally the only guarantees about struct layouts are:

  1. The first member is at offset 0.
  2. The members order in memory is the same as in the code.

You can use the first guarantee with this definition:

struct Pair {
    int p[2];
};

Now, sizeof(Pair) could be bigger than 2*sizeof(int) but that shouldn't matter too much.

Alternatively, if you want extra fun:

typedef int Pair[2];

Pointers to arrays are fun!

Anyway my advice is to do this:

int data[10]= {1,2,4,5,7,8,10,11,13,14};

for(int i=0; i<N; ++i)
{
    int *pair = data + 2*i;
    std::cout << i << ": (" << pairs[0] << ", " << pairs[1] << ")" << std::endl;
}

Or if you prefer the extra fun:

typedef int Pair[2];
int data[10]= {1,2,4,5,7,8,10,11,13,14};
Pair *pairs = (Pair*)data;

for(int i=0; i<N; ++i)
{
    std::cout << i << ": (" << pairs[i][0] << ", " << pairs[i][1] << ")" << std::endl;
}
rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • Will I also know for sure that sizeof(Pair) == 2*sizeof(int), which I need since I am casting the array and need the elements from index 1 to also align. – Herbert May 05 '14 at 19:56
  • @Herbert: I think you could have padding bytes at the end of `Pair`, although I don't know why a compiler would do such a thing. Anyway, that should not affect the index==1 case. – rodrigo May 05 '14 at 19:59
  • Why not? pairs[0] would be aligned, but pairs[1] not anymore, right? – Herbert May 05 '14 at 20:01
  • Ah, yes! I thought you wanted `pair[0][1]`. I don't think you are 100% sure that there cannot be an extra padding. – rodrigo May 05 '14 at 20:03
  • I may go for your latter solution, as I would suggest that sizeof(int[2]) should be equal to 2*sizeof(int) in any case. – Herbert May 05 '14 at 20:06
1

Does this mean that I am doomed to copy my data or use nasty syntax?

No

Are there other ways?

Yes, use a wrapper class that provides the syntax you like. Here's one way

http://ideone.com/nitI0B

#include <iostream>

struct Pairs {
    int* _data;
    Pairs( int data[] ) : _data(data) {}
    int & first( size_t x ) const { return _data[x*2]; }
    int & second( size_t x ) const { return _data[x*2+1]; }
};

int main() {
    int N=5;
    int data[10]= {1,2,4,5,7,8,10,11,13,14};
    Pairs pairs( data );

    for(int i=0; i<N; ++i)
        std::cout << i << ": (" << pairs.first(i) << ", " << pairs.second(i) << ")" << std::endl;

    return 0;
}

Update

Here's a solution that wraps an int[2] in a struct (like C++11 std::array) but tolerates (in fact forces) padding by the compiler after the int[2]. The compiler is not likely to add any additional padding, but the standard doesn't preclude it. I've also added a random access iterator to allow passing iterators to std::sort and sort the original data as pairs. I did this for my one education, might be more trouble than it's worth.

See this in ideone

// http://stackoverflow.com/questions/23480041/is-the-member-field-order-of-a-class-stable
#include <iostream>
#include <algorithm>
#include <stddef.h>

struct Pair {
    int _data[2]; // _data[0] and _data[1] are consecutive,
                  // and _data[0] is at offset 0 (&Pair == &_data[0])
    int _unused[6]; // simulate the compiler inserted some padding here
    int first() { return _data[0]; }
    int second() { return _data[1]; }
    int & operator[] ( size_t x ) { return _data[x]; }
    friend inline bool operator< ( const Pair & lhs, const Pair & rhs ) {
        return lhs._data[0] < rhs._data[0];
    }
    // it is unlikely that the compiler will add any padding to a struct
    // Pair, so sizeof(Pair) == sizeof(_data)
    // however, the standard doesn't preclude it, so we define our own
    // copy constructor and assignment operator to ensure that nothing
    // extraneous is stored
    Pair( const Pair& other ) {
        _data[0] = other._data[0];
        _data[1] = other._data[1];
    }
    const Pair& operator=( const Pair& other ) {
        _data[0] = other._data[0];
        _data[1] = other._data[1];
        return *this;
    }
};

struct Pairs {
    int* _data;
    size_t _size;
    Pairs( int data[], size_t size ) : _data(data), _size(size) {}
    Pair & operator[] ( size_t x ) const {
        return *reinterpret_cast< Pair * >( _data + 2 * x );
    }
    class rai
        : public std::iterator<std::random_access_iterator_tag, Pair>
    {
        int * _p;
        size_t _size;
        size_t _x;
    public:
        rai() : _p(NULL), _size(0), _x(0) {}
        rai( int* data, size_t size )
            : _p(data), _size(size), _x(0) {}
        friend inline bool operator== (const rai& lhs, const rai& rhs) {
            return lhs._p == rhs._p && lhs._x == rhs._x;
        }
        friend inline bool operator!= (const rai& lhs, const rai& rhs) {
            return lhs._p != rhs._p || lhs._x != rhs._x;
        }
        Pair& operator* () const {
            return *reinterpret_cast< Pair * >( _p + 2 * _x );
        }
        rai& operator+=( ptrdiff_t n ) {
            _x += n;
            if (_x >= _size) { _x = _size = 0; _p = NULL; }
            return *this;
        }
        rai& operator-=( ptrdiff_t n ) {
            if (n > _x) _x = 0;
            else _x -= n;
            return *this;
        }
        friend inline rai operator+ ( rai lhs, const ptrdiff_t n ) {
            return lhs += n;
        }
        friend inline rai operator- ( rai lhs, const ptrdiff_t n ) {
            return lhs -= n;
        }
        friend inline bool operator< ( const rai & lhs, const rai & rhs ) {
            return *lhs < *rhs;
        }
        rai& operator++ () { return *this += 1; }
        rai& operator-- () { return *this -= 1; }
        friend inline ptrdiff_t operator-(const rai& lhs, const rai& rhs) {
            return lhs._p == NULL
                ? (rhs._p == NULL ? 0 : rhs._size - rhs._x)
                : lhs._x - rhs._x;
        }
    };
    inline rai begin() { return rai(_data,_size); }
    static inline const rai end() { return rai(); }
};

int main() {
    int N=5;
    int data[10]= {1,2,7,8,13,14,10,11,4,5};
    Pairs pairs( data, N );

    std::cout << "unsorted" << std::endl;
    for(int i=0; i<N; ++i)
       std::cout << i << ": (" << pairs[i].first() << ", "
                 << pairs[i].second() << ")" << std::endl;

    std::sort(pairs.begin(), pairs.end());

    std::cout << "sorted" << std::endl;
    for(int i=0; i<N; ++i)
        std::cout << i
            << ": (" << pairs[i][0]  // same as pairs[i].first()
            << ", "  << pairs[i][1]  // same as pairs[i].second()
            << ")" << std::endl;

    return 0;
}
amdn
  • 11,314
  • 33
  • 45
  • That would not allow me to use std on the data, for example to sort it: http://ideone.com/pyC8Xl , right? – Herbert May 05 '14 at 19:59
  • @Herbert we might need a random access iterator for that... I started on it, will get to it again later – amdn May 05 '14 at 21:50
  • Wouldn't it be easier to use an array of int[2]s? – Herbert May 06 '14 at 08:46
  • @Herbert yes, I think I tried that and it didn't work in gcc or clang, but can't remember why... I'll play around with it this evening, it is an interesting challenge :) – amdn May 06 '14 at 16:02
  • I accepted your answer because it does what I asked for, however, to be perfectly honest, I think this overcomplicates the problem; that is way too much code for something simple ;) Thank you for the effort though, it does illustrate a good point on how c++ has some limitations regarding manual memory segmentation and interpretation. – Herbert May 22 '14 at 09:47
  • Thanks, I took it as a challenge to meet the criteria and be standard compliant - it was a learning exercise - I totally agree it is a lot of code and an eye opener. – amdn May 22 '14 at 09:54
  • Your efforts are highly appreciated. – Herbert May 22 '14 at 10:27