0

Let us say I'm initializing a vector vector<bool> V(n);. Is there a way I can know if V[n] is initialized or not? I need this for dynamic programming purposes. If the V[n] is initialized, I would utilize the value V[n] to obtain the result. If it's not initialized yet, I'd apply a function foo(.., n) or something to obtain the value of V[n]. I am asking this because I don't want to initialize a vector<int> V(n, -1); with 3 states like -1(for unassigned, or yet to find), 0(for false), and 1(for true). Instead, if there is a way to know if a variable V[n] is unassigned, I may be able to save some space for large values of n.

madhan01
  • 117
  • 5
  • 4
    For `std::vector` every element is initialized. You can't know if it still has its initial value or if it has been assigned a value (possible the same as the initial value). You need a type that can represent at least 3 states, possibly `int` like you suggest or `std::optional`. – François Andrieux Nov 09 '22 at 16:29
  • For a vector it is guaranteed that each member is initialized to it's default state. Other than that there's no way to tell except for wrapping the value in `optional`. – Mark Ransom Nov 09 '22 at 16:29
  • All objects are initialized when defined. They might be initialized to garbage, but they are intialized. – NathanOliver Nov 09 '22 at 16:30
  • `std::vector>`? But `sizeof(std::optional) > sizeof(int)`. – Evg Nov 09 '22 at 16:31
  • 1
    *You* need to keep track of what has been initialized. You cannot ask a variable "have you been initialized?". – Jesper Juhl Nov 09 '22 at 16:38
  • How could you save *space* by knowing whether an object has been initialized? – molbdnilo Nov 09 '22 at 16:43
  • 1
    vector of bool is not a good idea - it is broken in several ways. it seems you want something like vector of three-state – Neil Butterworth Nov 09 '22 at 16:51
  • @molbdnilo I would be saving space by using a bool vector instead of an int vector. I don't need to use vector and initialize them to -1 if I have a way of knowing if the object has been initialized. – madhan01 Nov 09 '22 at 16:54
  • @NeilButterworth Yes. Does something like that exist? like some datatype with sizeof(dtype)=1. – madhan01 Nov 09 '22 at 16:58
  • 1
    Elaborating on @JesperJuhl , possibly create a 2nd bool vector VInitialized where you maintain the "is initialized" state as a bool vector (vs. a single int vector). It saves space, but you need to "keep it around" with the actual vector. – franji1 Nov 09 '22 at 17:07
  • 3
    @fran i don't see how this saves space at all - rather the reverse. i would sugest a vector of char, using -1 to indicate "i don't know". and don't use vectors of bool – Neil Butterworth Nov 09 '22 at 17:23
  • 1
    The issues with `std::vector` can be found [here](https://stackoverflow.com/q/17794569/10107454). – paleonix Nov 09 '22 at 17:31
  • `vector` by definition only stores one bit per element. Storing the initialized state would require another bit; clearly this is not possible without a different or additional data structure. I like @NeilButterworth's suggestion of `vector`. – Mark Ransom Nov 09 '22 at 17:36

1 Answers1

1

Just gathering all the comments into a readable answer.

All the members of a vector that exist are intialised, so to solve the problem we really need to represent 3 states, Uninitialised, False, True, and create the entries as Uninitialised. We would want the vector to initially contain nodes in state Uninitialised.

So how best to represent this tristate? Considerations: Code maintainability; speed of access; memory usage.

vector<bool> is a special implementation of vector which /may/ be optimised to store more than 1 value per byte. It is possible to squeeze 8 bool bits into a byte. So a vector of 1000 bool will only use 125 bytes.

If you create any other vector of data, it will store an object of the size of that data type, so char, for example, or more exactly a vector<int8_t>, would use 1 byte per entry. 1000 chars would use 1000 bytes.

A vector<int> would use a number of bytes per entry, probably at least 4, so would cost 4000 bytes to hold 1000 elements.

But you would only be using 3 of the possible 255 states in a char, so using a vector of char would be more efficient than a vector of int, but is still somewhat wasteful of storage vs the vector<bool>. You might not care about that, and that is a fair approach. The code generated by vector<bool> is more complex than for the normal vector, so your code would be slower..

Let's go mad and use an enum:

  enum class State: int8_t
  {
    uninitialised = -1,
    False: 0,
    True: 1
  };

  std::vector<State> V(n,State::uninitialised);

But what about vector<bool> ?

The tighter forms suggested are to use 2 vectors of bool, one to say whether the entry is valid and the second to say that its value is set. This will cost 2*125 bytes, or 256 bytes for 1000 entries. That is still a saving over a vector of char.

Or you could write your own wrapper for vector where you treat 2 consecutive entries as the valid and set flags, and you allocate it twice as large as you wanted. This has the advantage of locality of reference, and potentially the optimiser can somewhat merge consecutive questions "is it valid" then "is it set".

So you save some storage, for the cost of some additional complexity (speed loss). You could wrap this in a class with accessors to hide the complexity.

If you were going to do that, you could write your own wrapper round a vector<unit8_t> which divides the input index by 4 and splits the stored value into 4 2-bit tri-state values. This would possibly be slightly faster overall, as you would not separately ask the vector "is it valid" then "is it set".

You /could/ squeeze more than 4 tristates into a byte - you can get 5, but that generates very slow code all round. The compiler knows how to divide by 4 very efficiently, and is less able to quickly divide by 5, or by powers of 3.

These days we tend to choose speed and simplicity over space saving, so do the vector<bool> thing for fun if you like, but stick with the vector of char.

That's all good.

I guess the other question I have to ask, though, is under what conditions is an entry invalid? Are they made valid sequentially? If the number of valid entries an indication that the higher indices are not yet valid?

In which case you could just start with an empty vector<bool> and push new values onto it as you need them - use index < size() to decide if the current index is valid or not? You can use reserve() to avoid the vector reallocating as it grows. This saves half the required storage, and keeps the code complexity manageable, so it is worth considering.

Of course in your case initialised/validity may be a completely random state in which case this is not an option for you.

Gem Taylor
  • 5,381
  • 1
  • 9
  • 27