Is there an easy and efficient way to dynamically change a data type once that data type's limit has been reached?

Question

I would like to increase the size of a data type once that data type's limit has been reached. For example, let's say I have a class:

struct Counter {
    unsigned short x;
    void IncrementCount() { x++; }
};

Once I change the value of Counter.x, I would like it to promote x to be an unsigned int instead of an unsigned short.

int main() {
    Counter c;
    c.x = ~0;  // c.x is max unsigned short value 2^16
    c.IncrementCount();     // c.x is now an unsigned int with value 2^16 + 1, NOT an unsigned short with value 0
}

The reason for me doing this is so that I store as little memory in my Counter class as possible.

Obviously I want as little impact to performance and readability as possible.

I've considered using a void pointer and casting to the appropriate data type (but how would I track the type without using extra memory?), changing the IncrementCount() method on my Counter class and creating a new Counter object (with an int instead of a short) when the unsigned short's limit is reached (but this will add some complexity, as well as an extra if on every increment). Maybe a bitmap which increases in size every time it needs an extra bit? Still adds complexity and some performance hit.

If I have 10 million Counters, I don't want to use an extra 2 bytes for an int (as that would require 20 million extra bytes). I know alignment might fix the issue for short -> int but I want this to also be valid for int32 -> int64 and other types. It is also worth noting that this is a runtime problem (I don't know the size at compile time).

Is there a simpler way in C++ to do this?

Why not make it a template and specify what the underlying type should be? That way if you know you need a small counter you use a small type. If you need a big counter or unknown then you use a larger type? — NathanOliver, Sep 07 '16 at 14:12
Just use `unsigned int` from the start. Is there an actual practical issue you are trying to solve? Your question sounds suspiciously like [an XY problem](http://xyproblem.info/). — Igor Tandetnik, Sep 07 '16 at 14:14
I should add, I don't know how much the count is going to be incremented at compile time, so I can't just specify a template parameter at compile time. I need a dynamic resizing at runtime. — Andrew, Sep 07 '16 at 14:15
@Igor Tandetnik If I have 10 million Counters, I don't want to use an extra 2 bytes for an int (as that would require 20 million extra bytes). I know alignment might fix the issue for short -> int but I want this to also be valid for int32 -> int64 — Andrew, Sep 07 '16 at 14:17
Any solution involving dynamic memory allocation and such is guaranteed to cost you much more than 2 bytes per instance. You can't magically get something for nothing. Besides, if `Counter` is expected to play clever games with storage, it can't just expose `x` as a public member like that and allow any odd piece of code to modify it. — Igor Tandetnik, Sep 07 '16 at 14:21
I have a feeling that the branch(es) you need to do this would hurt your performance more then using some space that is not needed. — NathanOliver, Sep 07 '16 at 14:31
You can't do this without some kind of flag that says if space is reserved or not ( attempt without http://ideone.com/mtVmoA ). This is because if the value goes to unsigned short max + 1 your unsigned short bits will be 00000000 and thus trying to check if you need to read an int or just the short by reading the short and determining it's value you'll get 0 instead of the actual value, max + 1. — Hatted Rooster, Sep 07 '16 at 15:34

WindyFields · Answer 1 · 2016-09-07T14:59:58.400

The only way to do that is just, as you said, to check each increment whether incremented values fits current type, if not - create new obj with say uint32 instead of uint16 and copy the old val. However, in this case if objects are polymorphic -> they must contain a pointer to a virtual table function which requires additional 8 byte memory.

Sure, you can consider storing count number in a different data structure rather then just a fundamental type, but most likely they contain some extra info which requires more memory (but the memory is vital in you case ) so that is not an option. Also you can't deal with a free store, because the pointer is 32-64 bits (depending on the machine) and it's the same as storing a uint64.

The point of all this is that you can't avoid copying the same reason you have to copy when reallocating a memory for a vector. You can never now what is located next to your counter in the memory -> now way to just "resize" a type.

How you can figure out that the counter number is too big for current type and new type is needed? Very simple, just check x + 1 value for being equal to 0. If that happened -> overflow took place -> current type is to small.

To summarize, copying is the only way you can "resize" an object. But if you agree to do that, you will need to write some code for checking and copying and also you can't avoid value checking overhead each increment.

score 1 · Answer 2 · edited May 23 '17 at 10:33

Data types in C and C++ have to be totally defined at compile time. You can't have a short variable that then gets promoted to an int.

Programming languages like Python attach the data type to values, not variables. That's the reason why you can do:

a = 1
a = "hi"

Because the data type is attached to the value 1 and then to the value "hi".

The price to pay for this is that the bookkeeping for "a" is usually high, with at least one pointer to a dynamically allocated block of memory, bookkeeping for dynamic memory allocation, and values have a data type discrimination mechanism to get the value of a data type. This mechanism allows to infer data types at runtime at the cost of lower runtime efficiency.

There are at least two ways to implement this. Historically this has been done as a variant data type. See here for more information. The alternative is to do it as an object oriented way, where you have a base class Object with all possible operations. This is more or less what Python does, but in C instead with C++ and with function pointers. For example:

class Object {
public:
  virtual Object_ptr add(Object_ptr other) = 0;
};

For example the + operation for ints is the usual arithmetic add, where 2+2=4 but for strings is the concatenation operation, where "Hello " + "World" = "Hello World". Following the same logic we can do:

class IntObject {
public:
  virtual Object_ptr add(Object_ptr other) {
    if (other->dataType() == DATATYPE_INT) {
      return new IntObject(this->value + other->value);
    } else {
      raiseError("dataTypeError");
    }
  }
};

Python also have the nice feature of long-arbitrary size integers. You can have a value with as many bits as memory you have. For example in Python:

>>> 1<<128
340282366920938463463374607431768211456L

Internally, Python detects that the number won't fit in an int32/int64 and will upgrade the data type at runtime. This is what more or less Python does internally:

class IntObject {
public:
  virtual Object_ptr add(Object_ptr other) {
    if (other->dataType() == DATATYPE_INT) {
      if (operationWillOverflow(other)) {
        auto a = new LongIntObject(this);
        auto b = new LongIntObject(other);
        return a.add(b);
      } else {
        return new IntObject(this->value + other->value);
      }
    } else {
      raiseError("dataTypeError");
    }
  }
};

What do you think of the idea of returning a new Object that holds a different data type (for example, exact copy of Counter class with a int instead of a short) when count would reach the data type's limit? — Andrew, Sep 07 '16 at 15:04
In C++ instances of classes with virtual methods have a penalty of at least 4/8 bytes for the vtable. And then you have to pay a pointer to the instance. So you save 2 bytes but you need to pay 16 bytes more. And also you need to pay for the extra code that gets executed. Also virtual method calls are usually 10x slower. — vz0, Sep 07 '16 at 15:06
But if the Counter was already a part of an inheritance with virtual methods, would those penalties be nullified? Obviously not the same use case as my simple example, but it is worth noting. — Andrew, Sep 07 '16 at 15:17

score 0 · Answer 3 · answered Sep 07 '16 at 14:43

C++ is a statically typed language. Every data structure has a compile-time size which cannot be changed.

You can have a class allocate more memory. But that is a separate allocation from the class's own storage (and as noted in the comments, it would be just as memory costly if not moreso than an int). A specific C++ class will only ever have a single size.

score 0 · Answer 4 · answered Sep 07 '16 at 14:47

Have a look at your data storage policy for the 20 million values. There are two basic ways:

If you store objects in an array their size and type is set in stone. That is Nicol's answer. It's what has been recommended in the comments: Use the largest integer type needed and be done.

If you store pointers to your objects you are free to replace an object pointed to with another type which has a larger value space. But you have a memory overhead built-in from the start, probably an overhead of two pointers per value (the second one is the hidden vtable pointer inside the polymorphic object). That doesn't make sense.

If you want to change the underlying integer type of the same object you must allocate the integer dynamically and hold a pointer to that memory in your object; you just moved the pointer (this time to an integer object of various sizes) into the object, with the same overhead, plus the memory overhead for the heap administration which becomes dominant for small little chunks like here.

Is there an easy and efficient way to dynamically change a data type once that data type's limit has been reached?

4 Answers4