C++ templates optimization

Question

At which point will/can the part about the template method be optimized by a compiler? Will it remove unreachable code, unwrap unecessary loops? (Bits uses unsigned int blocks, Integer uses unsigned long ones)

Plus, is there a c++ data type meaning "I'm an integer of the size of your processors registries"?

template<size_t bits> class IntegerFactoryImpl : public IntegerFactory<Integer<bits>>{
private:
    template<int sizeOfLong, int sizeOfInt> Integer<bits> getOne(const Bits& b) const{

        Integer<bits> integer = this->getOne();
        size_t roof = (b.blocks() > integer.size()*(sizeOfLong/sizeOfInt))? integer.size()*(sizeOfLong/sizeOfInt) : b.blocks();
        for(size_t i = 0; i < roof; ++i){
            integer.at(i/(sizeOfLong/sizeOfInt)) = 0;
            for(size_t j = 0; j < (sizeOfLong/sizeOfInt); ++j){
                if(i % (sizeOfLong/sizeOfInt) == j){
                    integer.at(i/(sizeOfLong/sizeOfInt)) |= ((unsigned long)b.block(b.blocks()-i-1)) << (sizeOfInt*j);
                    break;
                }
            }
        }
        for(size_t i = roof; i < integer.size()*(sizeOfLong/sizeOfInt); ++i){
            if(i % (sizeOfLong/sizeOfInt) == 0){
                integer.at(i/(sizeOfLong/sizeOfInt)) = 0;
            }
        }
        return integer;
    }

public:

    virtual ~IntegerFactoryImpl() throw(){}

    virtual Integer<bits> getOne() const{
        return Integer<bits>();
    }

    virtual Integer<bits> getOne(const Bits& b) const{
        return this->getOne<sizeof(unsigned long)*8, sizeof(unsigned int)*8>(b);
    }
};

Will there be a difference with this code (without template method):

template<size_t bits> class IntegerFactoryImpl : public IntegerFactory<Integer<bits>>{

public:

    virtual ~IntegerFactoryImpl() throw(){}

    virtual Integer<bits> getOne() const{
        return Integer<bits>();
    }

    virtual Integer<bits> getOne(const Bits& b) const{

        Integer<bits> integer = this->getOne();
        size_t roof = (b.blocks() > integer.size()*((sizeof(unsigned long)/sizeof(unsigned int)))? integer.size()*((sizeof(unsigned long)/sizeof(unsigned int)) : b.blocks();
        for(size_t i = 0; i < roof; ++i){
            integer.at(i/((sizeof(unsigned long)/sizeof(unsigned int))) = 0;
            for(size_t j = 0; j < ((sizeof(unsigned long)/sizeof(unsigned int)); ++j){
                if(i % ((sizeof(unsigned long)/sizeof(unsigned int)) == j){
                    integer.at(i/((sizeof(unsigned long)/sizeof(unsigned int))) |= ((unsigned long)b.block(b.blocks()-i-1)) << ((sizeof(unsigned int)*8)*j);
                    break;
                }
            }
        }
        for(size_t i = roof; i < integer.size()*((sizeof(unsigned long)/sizeof(unsigned int)); ++i){
            if(i % ((sizeof(unsigned long)/sizeof(unsigned int)) == 0){
                integer.at(i/((sizeof(unsigned long)/sizeof(unsigned int))) = 0;
            }
        }
        return integer;
    }
};

(edit: I just discovered the code doesn't work well (I fixed it) but the original question still applies..)

What is the problem you are trying to solve where this is your solution? — GManNickG, Feb 21 '13 at 21:50
See http://stackoverflow.com/questions/582302/are-there-optimized-c-compilers-for-template-use — Mihai8, Feb 21 '13 at 21:52
Plus, is there a c++ data type meaning "I'm an integer of the size of your processors registries". Hmmm "int" ? — SeedmanJ, Feb 21 '13 at 21:52
No, this is what I coded to properly handle 32/64bits compilation outputs, my question is : "Do I have to code every pair combination code or will the first solution (above) be as efficient?" — Mmmh mmh, Feb 21 '13 at 21:55
I actually uses **longs** for the purpose, they are 32bits long min, 64bits on AMD64 so they fit but they aren't exactly "integer of the size of your processors registries", on a 16 proc they would still be 32 min. — Mmmh mmh, Feb 21 '13 at 21:59
@AurélienOoms - `int` is supposed to be "the natural size" for the target architecture. Ask your compiler vendor what they think that means. — Pete Becker, Feb 21 '13 at 21:59
`int` isn't guaranteed to be the same size as your registers, but in practice it often is the same size as your "regular" registers. Since your 64-bit AMD processor is really just an x86 (32-bit) architecture with a 64-bit extension, it still makes sense for `int` to be 32-bits since you can still consider that to be a "natural" size. — Cornstalks, Feb 21 '13 at 21:59
@GManNickG the problem is to build an Integer instance from a Bits instance where Bits use **uint** blocks while Integer use **ulong** — Mmmh mmh, Feb 21 '13 at 22:02
@AurélienOoms: You're missing the point of my question: what for? Think outside the code, what's the real-world issue? — GManNickG, Feb 21 '13 at 22:07
@AurélienOoms: The only guarantee about `long` is that it's `>= sizeof(int)` and that it can hold at least as much as an `int` can. As far as I know, there's no suggestions made between `long` and any kind of "natural" register size. — Cornstalks, Feb 21 '13 at 22:07
@GManNickG Is it this you want : "**uint** and **ulong** have a platform dependent size, I wan't to cover as much platforms I can." — Mmmh mmh, Feb 21 '13 at 22:10

score 1 · Accepted Answer · answered Feb 21 '13 at 22:12

1

Right, the compiler will optimise away things that it can calculate at compile time, and if you have a loop that only iterates once (e.g. for(i = 0; i < 1; i++), it will remove the loop completely.

As to integer sizes, it really depends on what you are trying to achieve if it's better to use long or int. In x86-64, for example, a 64-bit operation will take an extra byte to indicate that the instruction following is a 64-bit instruction instead of a 32-bit instruction. If the compiler made int 64-bits long, the code would become (a little bit) larger, and thus fit less nicely in caches, etc, etc. There is no speed benefit between 16-, 32- or 64-bit operations [for 99% of the operations, multiply and divide being some of the obvious exceptions - the bigger the number, the longer it takes to divide or multiply it (( Actually, the number of bits SET in the number affects the multiply time, and I believe divide as well )) ] in x86-64. Of course, if you are, for example, using the values to perform bitmask operations and such, using long will give you 64-bit operations, which take half as many operations to perform the same thing. This is clearly an advantage. So it is "right" to use long in this case, even if it adds an extra byte per instruction.

Also bear in mind that very often, int is used for "smaller numbers", so for a lot of things the extra size of int would simply be wasted, and take up extra data-cache space, etc, etc. So int remains 32-bits also to keep the size of large integer arrays and such at a reasonable size.

answered Feb 21 '13 at 22:12

Mats Petersson

126,704
14
140
227

So the compiler is supposed to know the sizeof() return value? – Mmmh mmh Feb 21 '13 at 22:15
Plus is there a difference between this implementation and one other where I simply replace (see edit above) sizeOfxxx template arguments by (sizeOf(xxx)*8)? – Mmmh mmh Feb 21 '13 at 22:21
Yes, the compiler certainly knows what `sizeof` produces. And it knows how to multiply, divide, add and subtract integers (and usually floating point) too. – Mats Petersson Feb 21 '13 at 22:26
Thanks. As for the performances question, Integer is (indeed) used for multi-precision arithmetics, I can relate for the performance gain of using **ulongs** instead of **uints** on AMD64. – Mmmh mmh Feb 21 '13 at 22:35
Yes, obviously, doing half as many operations beats having slightly larger code... The point above is more about the fact that many times when we use `int`, the size of it doesn't need to be more than (or even as much as) 32 bits, and thus making `int` 64-bit everywhere would just waste code and data space for no extra benefit. – Mats Petersson Feb 21 '13 at 22:40
An other question : "Assume my processor was working with 64bit only registries, isn't there an overhead added for using 32bit *or lower* arithmetic (due too masking overflowing bits)?" – Mmmh mmh Feb 21 '13 at 22:59
1

If it's an x86-64, then it will be able to do almost everything in 32 or 64-bit with no difference in speed, or any extra instructions either way. There are a few exceptions, but they can pretty much be ignored. If you are after saving every last clock-cycle, using unsigned for array index will save the odd instruction in some cases. – Mats Petersson Feb 22 '13 at 00:47

C++ templates optimization

1 Answers1